SlideShare a Scribd company logo
1 of 27
A delineating procedure to retrieve
publication data in research areas – the case
of nanocellulose
Dr. Douglas Milanez
Centre for Technological Information in Materials/Federal University of São Carlos/Brazil
Dr. Ed Noyons
Centre for Science and Technology Studies /Leiden University/The Netherlands
15th International Conference on Scientometrics and
Informetrics- Stambul, 2015
The challenge to delineate
scientific research areas
• What is a research area, a field, discipline, topic?
• Global or local delineation?
• Classification systems  indispensable tool
– Structure and dynamics of scientific fields
– “As old as science”
• Database classification
– Journal assignment
– Multidisciplinary/ general journals?
1
The challenge to delineate
scientific research areas
• Classification systems  publication-level
initiatives:
– Text Similarity (Boyack et al., 2011)
• Medline database (2.15 million publications)
• More precise than the Medical Subject Headings
– Citation approach (Waltman & van Eck, 2012)
• Web of Science database (SCI and SSCI)
• Clustering algorithm and further criteria
• Labels extracted by text mining titles an abstracts
2
This study aims at proposing a delineation
procedure to retrieve publication data to
represent research areas
Nanocellulose
research
Case study
3
4
Nanocellulose
• Sustainable nanomaterial
– Any natural sources
– Bacterial fermentation
• Two types
– cellulose nanofibrils
– cellulose nanocrystals
• Great potential for innovation
– Composite materials
– Packing material
– Electronic devices
– Texturizing agent
– Medical devices
From Wikipedia, the free encyclopedia
Nanocellulose is a term referring to nano-structured
cellulose.
This may be either cellulose nanofibers (CNF) also called
microfibrillated cellulose (MFC), nanocrystalline
cellulose (NCC), or bacterial nanocellulose, which refers
to nano-structured cellulose produced by bacteria.
5
But …
• There is no subject category to define nanocellulose
• The area is difficult to capture by keywords:
– Too narrow
– Too broad
6
Proposed approach
(a bit local and global)
• Use keyword search;
• Define core set: seed publications;
• Locate these seeds in classification;
• Check results;
• Define area by selection clusters from classification.
7
• Overall delineation procedure
8
Methodology
• Establishing a search
expression
• Web of Science
online database
Initial set of
publication
• CWTS Publication-
level classification
system
• Lowest level
Prior research
areas retrieving • Cluster content
analysis
• Cleaning terms
establishment
Cleaning initial
set of publication
• Final set of
publication
• Selecting relevant
research areas
Final retrieving
and selection
9
CWTS Classification System
10
CWTS Classification System
Selecting research topics
Initial set of nanocellulose publications
(seeds)
• Terms found on literature and from expert suggestions
• Web of Science
• Search expression:
11
("bacterial cellulos*") OR ("cellulos* crystal*") OR ("cellulos* nanocrystal*") OR
("cellulos* whisker*") OR ("cellulos* microcrystal*") OR ("cellulos* nanowhisker*")
OR ("nanocrystal* cellulos*") OR ("cellulos* nano-whisker*") OR ("cellulos* nano-
crystal*") OR ("nano-crystal cellulos*") OR ("cellulos* micro-crystal*") OR
("cellulos* microfibril*") OR ("microfibril* cellulos*") OR ("cellulos* nanofibril*") OR
("nanofibril* cellulos*") OR ("micro-fibril* cellulos*") OR ("nano-fibril* cellulos*")
OR ("cellulos* micro-fibril*") OR ("cellulos* nano-fibril*") OR ("cellulos*
nanofiber*") OR ("nanocellulos*") OR ("cellulos* nanoparticle*") OR ("nano-
cellulos*") OR ("nanoparticl* cellulos*") OR ("nanosiz* cellulos*") OR ("cellulos*
nanofill*") OR ("nano-siz* cellulos*") OR ("cellulos* nano-fiber*") OR ("cellulos*
nano-particle*") OR ("cellulos* nano-fill*") OR ("nano-particl* cellulos*"))
12
Prior retrieve of nanocellulose research
areas
0
250
500
750
1000
1250
1500
1750
2000
2250
2500
2750
3000
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550
NumberofPublications
Cluster (1- 533)
533
research
areas!
• 80% of clusters contained less than
three publications from the initial core
• Which clusters are relevant?
What information do we have per
cluster?
A. Number of publications in a cluster;
B. Number of publications retrieved by search strategy
(seed set);
C. Number of publications from seed per cluster.
13
First results
• Two clusters  Nuclei research topics
– Contained 56.3% of the seed nanocellulose set;
– Their (automated) labels relate to nanocellulose terms
• Cellulose Nanocrystals and Cellulose Nanofibrils
• Bacterial Cellulose (type of cellulose nanocrystal)
Analysis of the nanocellulose pub. on peripheral clusters
14
NucleiPeripheral
Analyzing the relevance of the peripheral
research areas
• approach: check if an article was focusing on
nanocellulose or not:
– Reading each title (and abstract if needed)
– Nanocellulose should be the main subject
– Only articles from the seed set of nanocellulose publications
included on the cluster were checked
– A total of 20 peripheral clusters were checked
15
Percentage of noise of core-nanocellulose
publications on peripheral research areas
0,0
5,0
10,0
15,0
20,0
25,0
30,0
35,0
40,0
45,0
50,0
0,0
10,0
20,0
30,0
40,0
50,0
60,0
70,0
80,0
90,0
100,0
5.1.21
5.1.4
13.19.17
13.6.2
13.19.3
13.6.1
9.13.4
13.6.33
13.6.43
13.6.7
18.5.4
13.6.17
13.6.22
13.6.12
13.6.9
13.6.31
13.6.29
1.33.9
13.18.1
13.11.41
Percentageproportion
Percentageofnoisepublications
Research area
Left Axis Right Axis
16
Extremely
noised
research
areas
Improving the initial seed set of
nanocellulose publications
• Selecting terms from title and abstract:
text mining approach
• VOSviewer
• Terms to ‘clean’ the first set of
nanocellulose publications:
17
"gene" OR "xyloglucan" OR "microtubule" OR "*cyto*" OR "kinesi" OR "tubulin" OR
"*cell wall*" OR "spindle" OR "phragmoplast" OR "mitosis" OR "preprophase" OR
"phenotype" OR "*plant growth*" OR "meiosi" OR "*lignin distribution*" OR
"delignification" OR "hemicellulose" OR "saccharification" OR "ethanol yield" OR
"lignocellulos*" OR "glucosidase" OR "xylanase"
Effect of cleaning the initial set of
nanocellulose publications
• Checking performed on peripheral research topics
18
0
20
40
60
80
100
120
140
160
5.1.21
5.1.4
13.19.17
13.6.2
13.19.3
13.6.1
9.13.4
13.6.33
13.6.43
13.6.7
18.5.4
13.6.17
13.6.22
13.6.12
13.6.9
13.6.31
13.6.29
1.33.9
13.18.1
13.11.41
Numberofpublications
Research areas
Before cleaning After cleaning
Research
areas
related to
biology
Effect of cleaning the initial set of
nanocellulose publications
• Checking performed on nuclei research topics
19
0,0 1,0 2,0 3,0 4,0 5,0 6,0
"*cell wall*"
"*cyto*"
"*lignin distribution*"
"*plant growth*"
"delignification"
"ethanol yield"
"gene"
"glucosidase"
"hemicellulose"
"kinesi"
"lignocellulos*"
"meiosi"
"microtubule"
"mitosis"
"phenotype"
"phragmoplast"
"preprophase"
"saccharification"
"spindle"
"tubulin"
"xylanase"
"xyloglucan"
Percentage decrease
13.6.3
13.6.11
Cleaning
terms
The cleaning step
did not affected
significatively the
nuclei research
areas
Independence test
• Effect of cleaning procedure on the top five authors
• They are important researchers in the subject of nanocellulose
• Relevant authors to nanocellulose research
• Their position as the top authors did not changed at all
– Except for author E, who went down to the seventh position
20
Author
Number of publication Decrease (%)
Before After Overall Nuclei
A 87 78 -10,3 -6,33
B 51 40 -21,6 -14,9
C 50 43 -14 0
D 50 39 -22 -18,2
E 48 29 -39,6 -26,5
Final retrieving
• 2,600 nanocellulose publications (core-
nanocellulose)
– 428 research areas
– 81.0% clusters with one or two publications from the core-
nanocellulose
• Are they relevant?
• Selecting relevant research areas
– Pareto Principle (or 80/20 rule)
– Research areas with one or two publications not considered
• 85 clusters
21
22
Selecting relevant research areas
• Pareto Principle:
0,0
10,0
20,0
30,0
40,0
50,0
60,0
70,0
80,0
90,0
100,0
0 10 20 30 40 50 60 70 80 90
Cumulativepercentagenumber
Reserch areas
12
Research
Topics
23
Map of the Nanocellulose
main research topics
(local map of globally identified topics, c.f. Boyack)
Nuclei areas
• Delineating scientific fields is a complex task:
– Boundaries are not frequently well established since scientific studies
– More and more exchange of knowledge between scientists from
different disciplines is involved.
• Our approach retrieves and delineates the nuclei and the
peripheral research areas concerning nanocellulose
research
24
Conclusion and next steps
• Study the knowledge flow from peripheral research
topics to the nuclei areas.
• Map how they provide the necessary knowledge to face
nanocellulose current challenges and how country and
scientific institutions are contributing to this evolution.
25
Conclusion and next steps
Dank u wel!
Muito Obrigado!
Thank you!
Questions?
Suggestions?

More Related Content

Viewers also liked

Full counting vs. fractional counting
Full counting vs. fractional countingFull counting vs. fractional counting
Full counting vs. fractional countingNees Jan van Eck
 
Large-scale analysis of bibliometric networks
Large-scale analysis of bibliometric networksLarge-scale analysis of bibliometric networks
Large-scale analysis of bibliometric networksNees Jan van Eck
 
Large-scale analysis of bibliometric data sources
Large-scale analysis of bibliometric data sourcesLarge-scale analysis of bibliometric data sources
Large-scale analysis of bibliometric data sourcesNees Jan van Eck
 
Advanced bibliometric software tools for publishers and editors
Advanced bibliometric software tools for publishers and editorsAdvanced bibliometric software tools for publishers and editors
Advanced bibliometric software tools for publishers and editorsNees Jan van Eck
 
VOSviewer and CitNetExplorer Tutorial
VOSviewer and CitNetExplorer TutorialVOSviewer and CitNetExplorer Tutorial
VOSviewer and CitNetExplorer TutorialNees Jan van Eck
 
A new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networksA new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networksNees Jan van Eck
 
Network visualization: Fine-tuning layout techniques for different types of n...
Network visualization: Fine-tuning layout techniques for different types of n...Network visualization: Fine-tuning layout techniques for different types of n...
Network visualization: Fine-tuning layout techniques for different types of n...Nees Jan van Eck
 
NESSHI and GEPHI: sociology of science as a breeding ground for tool building...
NESSHI and GEPHI: sociology of science as a breeding ground for tool building...NESSHI and GEPHI: sociology of science as a breeding ground for tool building...
NESSHI and GEPHI: sociology of science as a breeding ground for tool building...Clement Levallois
 
Applications of community detection in bibliometric network analysis
Applications of community detection in bibliometric network analysisApplications of community detection in bibliometric network analysis
Applications of community detection in bibliometric network analysisNees Jan van Eck
 
A systematic empirical comparison of different approaches for normalizing cit...
A systematic empirical comparison of different approaches for normalizing cit...A systematic empirical comparison of different approaches for normalizing cit...
A systematic empirical comparison of different approaches for normalizing cit...Nees Jan van Eck
 
From econometrics to bibliometrics
From econometrics to bibliometricsFrom econometrics to bibliometrics
From econometrics to bibliometricsLudo Waltman
 
VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...
VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...
VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...Nees Jan van Eck
 
Bibliometric network analysis: Software tools, techniques, and an analysis o...
Bibliometric network analysis: Software tools, techniques, and an analysis o...Bibliometric network analysis: Software tools, techniques, and an analysis o...
Bibliometric network analysis: Software tools, techniques, and an analysis o...Nees Jan van Eck
 
Getting started with CitNetExplorer
Getting started with CitNetExplorerGetting started with CitNetExplorer
Getting started with CitNetExplorerNees Jan van Eck
 
CWTS Leiden Ranking: An advanced bibliometric approach to university ranking
CWTS Leiden Ranking: An advanced bibliometric approach to university rankingCWTS Leiden Ranking: An advanced bibliometric approach to university ranking
CWTS Leiden Ranking: An advanced bibliometric approach to university rankingNees Jan van Eck
 

Viewers also liked (17)

Full counting vs. fractional counting
Full counting vs. fractional countingFull counting vs. fractional counting
Full counting vs. fractional counting
 
VOSviewer
VOSviewerVOSviewer
VOSviewer
 
Large-scale analysis of bibliometric networks
Large-scale analysis of bibliometric networksLarge-scale analysis of bibliometric networks
Large-scale analysis of bibliometric networks
 
Large-scale analysis of bibliometric data sources
Large-scale analysis of bibliometric data sourcesLarge-scale analysis of bibliometric data sources
Large-scale analysis of bibliometric data sources
 
Advanced bibliometric software tools for publishers and editors
Advanced bibliometric software tools for publishers and editorsAdvanced bibliometric software tools for publishers and editors
Advanced bibliometric software tools for publishers and editors
 
VOSviewer and CitNetExplorer Tutorial
VOSviewer and CitNetExplorer TutorialVOSviewer and CitNetExplorer Tutorial
VOSviewer and CitNetExplorer Tutorial
 
A new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networksA new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networks
 
Network visualization: Fine-tuning layout techniques for different types of n...
Network visualization: Fine-tuning layout techniques for different types of n...Network visualization: Fine-tuning layout techniques for different types of n...
Network visualization: Fine-tuning layout techniques for different types of n...
 
NESSHI and GEPHI: sociology of science as a breeding ground for tool building...
NESSHI and GEPHI: sociology of science as a breeding ground for tool building...NESSHI and GEPHI: sociology of science as a breeding ground for tool building...
NESSHI and GEPHI: sociology of science as a breeding ground for tool building...
 
Applications of community detection in bibliometric network analysis
Applications of community detection in bibliometric network analysisApplications of community detection in bibliometric network analysis
Applications of community detection in bibliometric network analysis
 
A systematic empirical comparison of different approaches for normalizing cit...
A systematic empirical comparison of different approaches for normalizing cit...A systematic empirical comparison of different approaches for normalizing cit...
A systematic empirical comparison of different approaches for normalizing cit...
 
From econometrics to bibliometrics
From econometrics to bibliometricsFrom econometrics to bibliometrics
From econometrics to bibliometrics
 
VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...
VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...
VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...
 
Bibliometric network analysis: Software tools, techniques, and an analysis o...
Bibliometric network analysis: Software tools, techniques, and an analysis o...Bibliometric network analysis: Software tools, techniques, and an analysis o...
Bibliometric network analysis: Software tools, techniques, and an analysis o...
 
Getting started with CitNetExplorer
Getting started with CitNetExplorerGetting started with CitNetExplorer
Getting started with CitNetExplorer
 
CWTS Leiden Ranking: An advanced bibliometric approach to university ranking
CWTS Leiden Ranking: An advanced bibliometric approach to university rankingCWTS Leiden Ranking: An advanced bibliometric approach to university ranking
CWTS Leiden Ranking: An advanced bibliometric approach to university ranking
 
Visualising data
Visualising dataVisualising data
Visualising data
 

Similar to Issi 2015-noyons-milanez ed

Presentation of ECOSTBio Action CM1305 at APC Keflavik (Iceland)
Presentation of ECOSTBio Action CM1305 at APC Keflavik (Iceland)Presentation of ECOSTBio Action CM1305 at APC Keflavik (Iceland)
Presentation of ECOSTBio Action CM1305 at APC Keflavik (Iceland)Marcel Swart
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOAlejandra Gonzalez-Beltran
 
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...geraintduck
 
Scientometric approaches to classification
Scientometric approaches to classificationScientometric approaches to classification
Scientometric approaches to classificationNees Jan van Eck
 
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v12016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v1Bruce Kozuma
 
Proposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender FrameworkProposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender FrameworkAravind Sesagiri Raamkumar
 
Ten basic guidelines for conducting and publishing a meta-analysis.pptx
Ten basic guidelines for conducting and publishing a meta-analysis.pptxTen basic guidelines for conducting and publishing a meta-analysis.pptx
Ten basic guidelines for conducting and publishing a meta-analysis.pptxPubrica
 
The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)Oscar Corcho
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
 
Experimental method of Educational Research.
Experimental method of Educational Research.Experimental method of Educational Research.
Experimental method of Educational Research.Neha Deo
 
Research Paper Writing For Microbiology In UK.pptx
Research Paper Writing For Microbiology In UK.pptxResearch Paper Writing For Microbiology In UK.pptx
Research Paper Writing For Microbiology In UK.pptxJohn William
 
Systematic reviews - a "how to" guide
Systematic reviews - a "how to" guideSystematic reviews - a "how to" guide
Systematic reviews - a "how to" guideIsla Kuhn
 
EVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - IntroductionEVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - IntroductionJonathan Eisen
 
A data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distributionA data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distributionElita Baldridge
 
Open Data and Ecological and Evolutionary synthesis
Open Data and Ecological and Evolutionary synthesisOpen Data and Ecological and Evolutionary synthesis
Open Data and Ecological and Evolutionary synthesisAntica Culina
 
Active research management and sharing
Active research management and sharingActive research management and sharing
Active research management and sharingJisc
 
Systematic review how to- easter 2014
Systematic review   how to- easter 2014Systematic review   how to- easter 2014
Systematic review how to- easter 2014Isla Kuhn
 

Similar to Issi 2015-noyons-milanez ed (20)

Presentation of ECOSTBio Action CM1305 at APC Keflavik (Iceland)
Presentation of ECOSTBio Action CM1305 at APC Keflavik (Iceland)Presentation of ECOSTBio Action CM1305 at APC Keflavik (Iceland)
Presentation of ECOSTBio Action CM1305 at APC Keflavik (Iceland)
 
Meta analysis_Sharanbasappa
Meta analysis_SharanbasappaMeta analysis_Sharanbasappa
Meta analysis_Sharanbasappa
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
 
Scientometric approaches to classification
Scientometric approaches to classificationScientometric approaches to classification
Scientometric approaches to classification
 
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v12016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
 
Proposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender FrameworkProposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender Framework
 
Ten basic guidelines for conducting and publishing a meta-analysis.pptx
Ten basic guidelines for conducting and publishing a meta-analysis.pptxTen basic guidelines for conducting and publishing a meta-analysis.pptx
Ten basic guidelines for conducting and publishing a meta-analysis.pptx
 
The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
Brm unit 2 2020
Brm unit 2 2020Brm unit 2 2020
Brm unit 2 2020
 
Experimental method of Educational Research.
Experimental method of Educational Research.Experimental method of Educational Research.
Experimental method of Educational Research.
 
Research Paper Writing For Microbiology In UK.pptx
Research Paper Writing For Microbiology In UK.pptxResearch Paper Writing For Microbiology In UK.pptx
Research Paper Writing For Microbiology In UK.pptx
 
Systematic reviews - a "how to" guide
Systematic reviews - a "how to" guideSystematic reviews - a "how to" guide
Systematic reviews - a "how to" guide
 
EVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - IntroductionEVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - Introduction
 
A data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distributionA data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distribution
 
Assay Development and Drug Repurposing Core
Assay Development and Drug Repurposing CoreAssay Development and Drug Repurposing Core
Assay Development and Drug Repurposing Core
 
Open Data and Ecological and Evolutionary synthesis
Open Data and Ecological and Evolutionary synthesisOpen Data and Ecological and Evolutionary synthesis
Open Data and Ecological and Evolutionary synthesis
 
Active research management and sharing
Active research management and sharingActive research management and sharing
Active research management and sharing
 
Systematic review how to- easter 2014
Systematic review   how to- easter 2014Systematic review   how to- easter 2014
Systematic review how to- easter 2014
 

Issi 2015-noyons-milanez ed

  • 1. A delineating procedure to retrieve publication data in research areas – the case of nanocellulose Dr. Douglas Milanez Centre for Technological Information in Materials/Federal University of São Carlos/Brazil Dr. Ed Noyons Centre for Science and Technology Studies /Leiden University/The Netherlands 15th International Conference on Scientometrics and Informetrics- Stambul, 2015
  • 2. The challenge to delineate scientific research areas • What is a research area, a field, discipline, topic? • Global or local delineation? • Classification systems  indispensable tool – Structure and dynamics of scientific fields – “As old as science” • Database classification – Journal assignment – Multidisciplinary/ general journals? 1
  • 3. The challenge to delineate scientific research areas • Classification systems  publication-level initiatives: – Text Similarity (Boyack et al., 2011) • Medline database (2.15 million publications) • More precise than the Medical Subject Headings – Citation approach (Waltman & van Eck, 2012) • Web of Science database (SCI and SSCI) • Clustering algorithm and further criteria • Labels extracted by text mining titles an abstracts 2 This study aims at proposing a delineation procedure to retrieve publication data to represent research areas
  • 5. 4 Nanocellulose • Sustainable nanomaterial – Any natural sources – Bacterial fermentation • Two types – cellulose nanofibrils – cellulose nanocrystals • Great potential for innovation – Composite materials – Packing material – Electronic devices – Texturizing agent – Medical devices
  • 6. From Wikipedia, the free encyclopedia Nanocellulose is a term referring to nano-structured cellulose. This may be either cellulose nanofibers (CNF) also called microfibrillated cellulose (MFC), nanocrystalline cellulose (NCC), or bacterial nanocellulose, which refers to nano-structured cellulose produced by bacteria. 5
  • 7. But … • There is no subject category to define nanocellulose • The area is difficult to capture by keywords: – Too narrow – Too broad 6
  • 8. Proposed approach (a bit local and global) • Use keyword search; • Define core set: seed publications; • Locate these seeds in classification; • Check results; • Define area by selection clusters from classification. 7
  • 9. • Overall delineation procedure 8 Methodology • Establishing a search expression • Web of Science online database Initial set of publication • CWTS Publication- level classification system • Lowest level Prior research areas retrieving • Cluster content analysis • Cleaning terms establishment Cleaning initial set of publication • Final set of publication • Selecting relevant research areas Final retrieving and selection
  • 12. Initial set of nanocellulose publications (seeds) • Terms found on literature and from expert suggestions • Web of Science • Search expression: 11 ("bacterial cellulos*") OR ("cellulos* crystal*") OR ("cellulos* nanocrystal*") OR ("cellulos* whisker*") OR ("cellulos* microcrystal*") OR ("cellulos* nanowhisker*") OR ("nanocrystal* cellulos*") OR ("cellulos* nano-whisker*") OR ("cellulos* nano- crystal*") OR ("nano-crystal cellulos*") OR ("cellulos* micro-crystal*") OR ("cellulos* microfibril*") OR ("microfibril* cellulos*") OR ("cellulos* nanofibril*") OR ("nanofibril* cellulos*") OR ("micro-fibril* cellulos*") OR ("nano-fibril* cellulos*") OR ("cellulos* micro-fibril*") OR ("cellulos* nano-fibril*") OR ("cellulos* nanofiber*") OR ("nanocellulos*") OR ("cellulos* nanoparticle*") OR ("nano- cellulos*") OR ("nanoparticl* cellulos*") OR ("nanosiz* cellulos*") OR ("cellulos* nanofill*") OR ("nano-siz* cellulos*") OR ("cellulos* nano-fiber*") OR ("cellulos* nano-particle*") OR ("cellulos* nano-fill*") OR ("nano-particl* cellulos*"))
  • 13. 12 Prior retrieve of nanocellulose research areas 0 250 500 750 1000 1250 1500 1750 2000 2250 2500 2750 3000 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 NumberofPublications Cluster (1- 533) 533 research areas! • 80% of clusters contained less than three publications from the initial core • Which clusters are relevant?
  • 14. What information do we have per cluster? A. Number of publications in a cluster; B. Number of publications retrieved by search strategy (seed set); C. Number of publications from seed per cluster. 13
  • 15. First results • Two clusters  Nuclei research topics – Contained 56.3% of the seed nanocellulose set; – Their (automated) labels relate to nanocellulose terms • Cellulose Nanocrystals and Cellulose Nanofibrils • Bacterial Cellulose (type of cellulose nanocrystal) Analysis of the nanocellulose pub. on peripheral clusters 14 NucleiPeripheral
  • 16. Analyzing the relevance of the peripheral research areas • approach: check if an article was focusing on nanocellulose or not: – Reading each title (and abstract if needed) – Nanocellulose should be the main subject – Only articles from the seed set of nanocellulose publications included on the cluster were checked – A total of 20 peripheral clusters were checked 15
  • 17. Percentage of noise of core-nanocellulose publications on peripheral research areas 0,0 5,0 10,0 15,0 20,0 25,0 30,0 35,0 40,0 45,0 50,0 0,0 10,0 20,0 30,0 40,0 50,0 60,0 70,0 80,0 90,0 100,0 5.1.21 5.1.4 13.19.17 13.6.2 13.19.3 13.6.1 9.13.4 13.6.33 13.6.43 13.6.7 18.5.4 13.6.17 13.6.22 13.6.12 13.6.9 13.6.31 13.6.29 1.33.9 13.18.1 13.11.41 Percentageproportion Percentageofnoisepublications Research area Left Axis Right Axis 16 Extremely noised research areas
  • 18. Improving the initial seed set of nanocellulose publications • Selecting terms from title and abstract: text mining approach • VOSviewer • Terms to ‘clean’ the first set of nanocellulose publications: 17 "gene" OR "xyloglucan" OR "microtubule" OR "*cyto*" OR "kinesi" OR "tubulin" OR "*cell wall*" OR "spindle" OR "phragmoplast" OR "mitosis" OR "preprophase" OR "phenotype" OR "*plant growth*" OR "meiosi" OR "*lignin distribution*" OR "delignification" OR "hemicellulose" OR "saccharification" OR "ethanol yield" OR "lignocellulos*" OR "glucosidase" OR "xylanase"
  • 19. Effect of cleaning the initial set of nanocellulose publications • Checking performed on peripheral research topics 18 0 20 40 60 80 100 120 140 160 5.1.21 5.1.4 13.19.17 13.6.2 13.19.3 13.6.1 9.13.4 13.6.33 13.6.43 13.6.7 18.5.4 13.6.17 13.6.22 13.6.12 13.6.9 13.6.31 13.6.29 1.33.9 13.18.1 13.11.41 Numberofpublications Research areas Before cleaning After cleaning Research areas related to biology
  • 20. Effect of cleaning the initial set of nanocellulose publications • Checking performed on nuclei research topics 19 0,0 1,0 2,0 3,0 4,0 5,0 6,0 "*cell wall*" "*cyto*" "*lignin distribution*" "*plant growth*" "delignification" "ethanol yield" "gene" "glucosidase" "hemicellulose" "kinesi" "lignocellulos*" "meiosi" "microtubule" "mitosis" "phenotype" "phragmoplast" "preprophase" "saccharification" "spindle" "tubulin" "xylanase" "xyloglucan" Percentage decrease 13.6.3 13.6.11 Cleaning terms The cleaning step did not affected significatively the nuclei research areas
  • 21. Independence test • Effect of cleaning procedure on the top five authors • They are important researchers in the subject of nanocellulose • Relevant authors to nanocellulose research • Their position as the top authors did not changed at all – Except for author E, who went down to the seventh position 20 Author Number of publication Decrease (%) Before After Overall Nuclei A 87 78 -10,3 -6,33 B 51 40 -21,6 -14,9 C 50 43 -14 0 D 50 39 -22 -18,2 E 48 29 -39,6 -26,5
  • 22. Final retrieving • 2,600 nanocellulose publications (core- nanocellulose) – 428 research areas – 81.0% clusters with one or two publications from the core- nanocellulose • Are they relevant? • Selecting relevant research areas – Pareto Principle (or 80/20 rule) – Research areas with one or two publications not considered • 85 clusters 21
  • 23. 22 Selecting relevant research areas • Pareto Principle: 0,0 10,0 20,0 30,0 40,0 50,0 60,0 70,0 80,0 90,0 100,0 0 10 20 30 40 50 60 70 80 90 Cumulativepercentagenumber Reserch areas 12 Research Topics
  • 24. 23 Map of the Nanocellulose main research topics (local map of globally identified topics, c.f. Boyack) Nuclei areas
  • 25. • Delineating scientific fields is a complex task: – Boundaries are not frequently well established since scientific studies – More and more exchange of knowledge between scientists from different disciplines is involved. • Our approach retrieves and delineates the nuclei and the peripheral research areas concerning nanocellulose research 24 Conclusion and next steps
  • 26. • Study the knowledge flow from peripheral research topics to the nuclei areas. • Map how they provide the necessary knowledge to face nanocellulose current challenges and how country and scientific institutions are contributing to this evolution. 25 Conclusion and next steps
  • 27. Dank u wel! Muito Obrigado! Thank you! Questions? Suggestions?

Editor's Notes

  1. Ed The idea of this slide is to introduce the challenge to delineate scientific research areas. I did not include any animation. Suggestion: You could start saying that classification system is always a subject to scientist as they like to categorize their research on a main area, such as chemistry, materials, engineering, etc. And it is as old as science. Then, the link is that the databases have initiatives to classify science using the journal assignment, but this approach has the drawback of not dealing properly with multidisciplinary journals. And currently, science has become more and more complex and interdisciplinary. For instance, how can we classify nanotechnology or biotechnology?
  2. Ed The idea here is to affirm that there area initiatives to classify the research areas using publication-level initiatives. I included the CWTS approach and other one that we have cited in our article. Both approaches were applied to a huge amount of papers indexed on databases. The link to the animation is that few bibliometric research aimed at perform an analysis to a specific topic or subject.
  3. This slide aims at presenting the subject: nanocellulose. What is important to know to understand the analysis further present on the slides: We can obtain nanocellulose from natural sources such as trees, natural fibers and agricultural waste Basically, we have two types of nanocellulose and they differ on the degree of crystallinity. Cellulose nanocrystals are more crystalline than nanofibrils The nanomaterial is interesting to innovation on new materials and devices You can also say that nanocellulose was the main topic of my thesis, and this is an strategic nanomaterial for Brazil, as we can found a wide variety of sources for nanocellulose. If you have any question, please, contact me.
  4. Ed The idea of this slide is to explain how the overall delineation procedure was conducted. It is important to highlight this is an interactive procedure. You can focus on the white box, because the following slides will detail the methodology: 1) Determine an initial set of publication concerning the theme of interest 2) Prior retrieval of nanocellulose research areas 3) Analysis of retrieved research area and cleaning of the initial set 4) Final retrieval and selection of relevant nanocellulose research areas.
  5. This slide and the next one explains the idea of our methodology. Each sphere correspond to an article clustered using the CWTS approach and published by Waltman & van Eck, 2012 We want to find which cluster contains a nanocellulose-related publication.
  6. This slide and the previous one explains the idea of our methodology. Each sphere correspond to an article clustered using the CWTS approach and published by Waltman & van Eck, 2012 We want to find which cluster contains a nanocellulose-related publication and then evaluate which of this clusters selected are relevant to nanocellulose.
  7. A search expression was developed considering several terms and synonyms recommended by experts and found in nanocellulose literature (Klemm et al., 2011; Milanez et al., 2013; Siqueira et al., 2010; Siró & Plackett, 2010), as can be seen from Table 1. The search expression encompassed different words that refer to cellulose nanocrystals, cellulose nanofibrils, and bacterial cellulose as well as other generic forms, such as nanocellulose, cellulose nanoparticles, and cellulose nanofiller. The search was conducted in March 31th 2014 in the online Web of Science database (topic search).
  8. The idea of this slide is to present the prior retrieve process. I think it is important to state that the number of publication from the X axis refers to the number of publication contained in each clusters, which includes nanocellulose publications 533 clusters were retrieved. The largest one contains 2,751 pub. while the smallest one contained 50 pub., as a criteria of the CWTS classification system. Which are relevant?
  9. The idea of this slide is to present the prior retrieve process. Two clusters highlighted since they cluster more nanocellulose publication that the others and it was interesting that their labels reveled that they talk about types of nanocellulose . It also says that nanocellulose has two important fields to describe it as a topic of research However 80% of the retrieved research area clustered less than 3 pub. And the important thing is: an evaluation of all research area retrieved would be too labour intensive, thus we made a selection of relevant clusters.
  10. You can say that I (as a Dr. in Materials Science and Engineering) could judge whether the article focused on nanocellulose or not. As cellulose is part from a complex of biological system, nanocellulose appeared in many studies. However, in many, the nanomaterial was not the focus.
  11. As to the 20 peripheral research topics whose nanocellulose set of publication were evaluated, no direct correlation was observed between the proportional relevance of each clusters and the percentage of noise, according to Figure 5. Four research topics had a high percentage (>70%) of ‘noisy’ publications mainly focusing on biological issues of plants, ethanol production, and enzymes aspects, not having the nanomaterial as a final object of research.
  12. The cleaning step focused on the first four cluster that were classified as extremely noised.
  13. Since the first four clusters were used to select the cleaning terms, the cleaning affected them highly. Two of them were even eliminated. Furthermore, other peripheral clusters had their nanocellulose publication coverage diminished.
  14. Half of the 22 terms we used to clean the nanocellulose search strategy did not affect the coverage of core-nanocellulose publications in the nuclei research areas, as depicted in Figure 4. To the other half, none term could reduce the coverage in more than 5%. The terms that influenced research area 13.6.4 the most were “*cell wall*” and “hemicelluloses” while “*cyto*”, “gene” and “*cell wall*” were the ones that decreased the most core-nanocellulose coverage in cluster 13.6.11. Overall, research topic 13.6.11 had its core-nanocellulose publication reduced in 17.5% while the decrease to cluster 13.6.3 was 10.2%. Nonetheless, both clusters still concentrated publication from the core-nanocellulose after the cleaning tasks (the proportion was 74.0% to research area 13.6.3 and 72.1% to 13.6.11). Therefore, they still had the status of nuclei research areas.
  15. An independency test was conducted to evaluate the effectiveness of the procedure proposed. The test involved retrieving the number of publication from the top five authors before and after cleaning and selecting the relevant research areas. The percentage decreases of their overall number of publication and from their main cluster were verified.
  16. We introduce here the Pareto Principle (or 80/20 rule). This principle states that “roughly 80% of the effects come from 20% of the causes” (Juran & Godfrey, 1998) and is found in bibliometric and library studies (Gupta, 1989; Kao, 2009; Stephens, Hubbard, Pickett, & Kimball, 2013). We hypothesize that 80% of the core set will be assigned to 20% of the areas. To reach these relevant research areas, the steps below were carried out: 1)The research areas were listed in descending order of the total number of publications from the core-nanocellulose; 2)Research topics with one or two publications from the core-nanocellulose were excluded. This yields 85 research areas remaining; 3)The representativeness of each research area was calculated by the number of publication of the core-nanocellulose of that cluster divided by 2,200 (which is the total of publication found in the 85 remaining research areas); 4)The cumulative percentage number of publications from the core-nanocellulose was obtained summing the values from the step before, as can be seen from Figure 3. The number of research to be assessed was those where the cumulative percentage number of publication reach approximately 80%. We found that twelve research areas covered the required 80%, which means 14.1% of the total of 85 research topics. We do not claim that our selecting procedure was perfect, but a quick analysis of the chosen research topics showed themes currently found in nanocellulose literature. According to Waltman and van Eck (2012), the lowest research area contain 50 publications, consequently, cluster with less than 1% of proportion were not account.
  17. Figure 6 presents a map with these research topics (nodes). These themes appears frequently in nanocellulose-focused studies .The map positions the topics on the basis of their citation relations. The closer two topics, the more frequent the citation traffic between them. The node labels match the main content of the clusters. Moreover, all selected clusters had their set of nanocellulose publication evaluated in the cleaning task.
  18. Derivated from the conclusion
  19. Derivated from the conclusion