SlideShare a Scribd company logo
1 of 40
Application of
a Novel Subject Classification Scheme
for a Bibliographic Database
Using a Data-Driven Correspondence
Kei Kurakawa, Yuan Sun
National Institute of Informatics, Japan
Satoko Ando
Clarivate Analytics (Japan) Co., Ltd.
This is the presentation slides for the workshop BigScholar 2019 in conjunction with CIKM 2019 (ACM International Conference
on Information and Knowledge Management) Nov 7, 2019, at CNCC, Beijing, China.
Citation: Kurakawa K, Sun Y and Ando S (2020) Application of a Novel Subject Classification Scheme for a Bibliographic Database
Using a Data-Driven Correspondence. Front. Big Data 2:48. doi: 10.3389/fdata.2019.00048
Overview
• Introduction
• Motivation
• Applying a new subject classification scheme for a subject-classified bibliographic database
• Our main contributions
• Related work
• Theoretical background
• Subject classification model of the bibliographic database based on set theory
• Main steps of our data-driven approach
• Case study
• Applying the Japanese grants KAKENHI subject classification scheme for the Web of
Science citation database
• Conclusions and future work
2
Motivation
• In assessing research activities based on bibliometrics, analysts are
accustomed to use the major citation database Web of Science whose
subject classification schemes, i.e. WoS Subject Category, ESI, and
GIPP are prepared for qualitative analysis.
• Analysts need domestic subject classification schemes for their
analysis, which are not implemented on the database.
• Applying a new classification scheme for the database by hand is too
much labor intensive and time consuming task.
• How can we apply a new classification scheme for the database,
efficiently and effectively?
3
Our main contributions
• We propose an approach to apply a novel subject classification
scheme for a subject-classified database using a data-driven
correspondence between the new and present ones, which is
accustomed to digital libraries.
• We give a fundamental analytical model of subject classification
scheme based on set theory and describe compact topological space
formation for a new subject classification scheme as a necessary
condition.
• We demonstrate the effectiveness and efficiency of our approach to a
practical bibliographic database.
4
Related work
• In the field of computer science,
• Information retrieval
• Data mining
• Digital libraries
• Automated text categorization
• Classification (supervised learning)
• Naïve bays classification
• Neural networks
• Support vector machines
• Clustering (unsupervised learning)
• K-means
• Expectation maximization (EM)
• Hierarchical agglomerative clustering
• Divisive clustering
• Matrix decompositions
• More problem specific method
• Multi-label classification / multi-label
learning, based on
• SVM
• Deep learning
• Ensemble classification.
• Extreme multi-label classification, based on
• Graph embedding
• Convolutional neural network (CNN)
• Attention model of neural networks
• Label hierarchy considered
• A method of mapping between different
classification schemes
• Importing cataloguing records using a
different classification scheme in digital
libraries
• Information integration on the Web
5
Theoretical background
• Subject classification model of the bibliographic database
• Compact topological space formation for a new subject classification
scheme
• Inducing a correspondence between two subject classification
schemes using a research project database
6
Given a bibliographic database
𝑆
7
Given two categories
𝑂1 𝑂2
𝑆
8
Given two categories
𝑂1 𝑂2
𝑆
Analytical basis
9
Given many categories
𝑆
10
𝔒(1)
= {𝑂𝑖}
Given many categories
𝑆
𝔒(1)
= {𝑂𝑖}
induces
Analytical basis
Topology
𝑆, 𝔒 1
𝔒 1 = {any unions of analytical basis}11
Given a finite cover
𝑆
Compact topological space12
𝔒(1)
= {𝑂𝑖}
𝑆, 𝔒 1
Another set of categories
𝑆
Compact topological space13
𝔒(2)
= {𝑂𝑖}
𝑆, 𝔒 2
If we have an external database such as …
• Research project database
𝑆′
𝑆
𝑇
𝑏
𝑂
ℎ
articles
projects
Compact topological space
𝑇, 𝔒 𝑇
2
14
If we have an external database such as …
• Research project database
𝑆′
𝑆
𝑇
𝑏
𝑂
ℎ
articles
projects
Compact topological space
𝑇, 𝔒 𝑇
2
Compact topological space
𝑆′
, 𝔒 𝑆′
2
We can define a compact
topological space for the
second set of categories.
15
Compact topological spaces for the two
subject classification schemes
𝑆′
𝔒 1
= {𝑂𝑖
1
}
Compact topological space
𝑆′
, 𝔒 𝑆′
2
𝑆′
, 𝔒 𝑆′
1
𝔒 2
= {𝑂𝑖
2
}
16
Correspondence between two subject
classification schemes
𝑂𝑗
2
𝑂1
1
𝑂2
1
𝑂3
1 𝑂4
1
𝑂5
1
𝑂6
1𝑂7
1
17
Metrics for inducing a correspondence
Maximizing 𝐹𝛽-measure.
𝑑 𝑝𝑗 =
𝑖∈𝐼
𝑗
1 𝑂 𝑗
2
∩𝑂𝑖
1
𝑖∈𝐼
𝑗
1 𝑂𝑖
1 (precision),
𝑑 𝑟𝑗 =
𝑖∈𝐼
𝑗
1 𝑂 𝑗
2
∩𝑂𝑖
1
𝑂 𝑗
2 (recall),
𝑑 𝑓𝑗 =
1+𝛽2 𝑑 𝑝𝑗 𝑑 𝑟𝑗
𝛽2 𝑑 𝑝𝑗+𝑑 𝑟𝑗
, 𝛽 > 0, (𝐹𝛽-measure).
18
In practical cases, we often build a contingency
table for the two subject classification schemes
𝑂1
1
⋮
𝑂𝑖
1
⋮
𝑂 𝑚
1
𝑂1
2
⋯ 𝑂𝑗
2
⋯ 𝑂 𝑛
2
𝑓11 ⋯ 𝑓1𝑗 ⋯ 𝑓1𝑛
⋮
𝑓𝑖1
⋮
⋱ ⋮
⋯ 𝑓𝑖𝑗 ⋯
⋮ ⋱
⋮
𝑓𝑖𝑛
⋮
𝑓 𝑚1 ⋯ 𝑓 𝑚𝑗 ⋯ 𝑓𝑚𝑛
The first set of
subject categories
The second set of subject categories
𝑓𝑖𝑗 = 𝑂𝑖
1
∩ 𝑂𝑗
2
19
Pseudo metrics for inducing a
correspondence
Maximizing pseudo 𝐹𝛽-measure.
𝑑 𝑝𝑗
′
=
𝑖 𝑂 𝑗
2
∩𝑂𝑖
1
𝑖 𝑂𝑖
1 (pseudo precision),
𝑑 𝑟𝑗
′
=
𝑖 𝑂 𝑗
2
∩𝑂𝑖
1
𝑂 𝑗
2 (pseudo recall),
𝑑 𝑓𝑗
′
=
1+𝛽2 𝑑 𝑝𝑗
′
𝑑 𝑟𝑗
′
𝛽2 𝑑 𝑝𝑗
′ +𝑑 𝑟𝑗
′ , 𝛽 > 0, (pseudo 𝐹𝛽-measure).
20
Main steps of our approach
1’-2. Inducing a correspondence between the two
subject classification schemes by using pseudo
𝐹𝛽-measure
1’-1. Constructing a contingency table
between two subject classification
schemes
2. Revising the correspondence to guarantee the existence of a finite
cover of the novel subject classification scheme
1. Inducing a
correspondence between
the two subject
classification schemes by
using 𝐹𝛽-measure
21
Case study
• InCites™ (Clarivate Analytics)
• A world class research evaluation platform
• Web of Science™ citation database
• Web of Science classification scheme (251 categories)
• Essential Science Indicator(ESI) classification scheme
(22 categories)
• Japanese users are eager to utilize the subject
classification scheme of Japan’s largest national
research grants KAKENHI.
• KAKEN (NII) research project database
• Archival records of research projects and the
outputs of KAKENHI grants in Japan.
• KAKENHI subject classification scheme (hierarchical
classification scheme; 4 categories, 10 areas, 67
disciplines, and 284 research fields)
22
https://images.webofknowledge.com/images/help/WOS/hp_subject_category_terms_tasca.html
WoS subject classification scheme
23
https://www.jsps.go.jp/english/e-grants/data/09_2008/21startup_yoryo2_e.pdf
KAKENHI subject classification scheme
24
Developing a contingency table as evidence
data
• We identified the same bibliographic records in the WoS citation database as of 2009 and
2010 through a set of record linkage techniques to obtain a set of articles 𝑆′ that are
classified using both the KAKENHI and WoS classification schemes.
𝑆′
𝑆𝑇
𝑏
𝑂
ℎ
Web of ScienceKAKEN
𝑎1
𝑎2
𝑎3
𝑎1
′
𝑎2
′
𝑎3
′
articles articlesprojects
Bibliographic linkage
𝑎1
′
≡ 𝑎1
𝑎2
′
≡ 𝑎2
𝑎3
′
≡ 𝑎3
25
A contingency table between WoS and
KAKENHI subject classification schemes
a part of 251 WoS categories
x 67 KAKENHI areas
26
Analysis of the contingency table
27
,where is the rank value, its maximum value,
a normalized constant
and two fitting components.
The discrete generalized beta distribution (DGBD)
28
Dispersal type
Concentration type
Maximum pseudo 𝐹1-measure for the third-level
67 disciplines of the KAKENHI subject categories
against the 251 WoS subject categories
29
The third-level
67 disciplines
seq. no.
KAKENHI subject category Translation Cardinality No. of WoS subject
categories to get the
max pseudo F1-measure
Pseudo precision Pseudo recall Max pseudo F1
measure
(l3-01) 情報学 Informatics 6637 17 0.576 0.626 0.600
(l3-02) 神経科学 Brain sciences 1570 1 0.218 0.365 0.273
(l3-03) 実験動物学 Laboratory animal science 242 1 0.059 0.074 0.066
(l3-04) 人間医工学 Human informatics 1995 8 0.222 0.213 0.217
(l3-05) 健康・スポーツ科学 Health / sports science 844 5 0.181 0.290 0.223
(l3-06) 生活科学 Human life science 467 4 0.239 0.281 0.258
(l3-07) 科学教育・教育工学
Science education /educational
technology 388 2 0.377 0.103 0.162
(l3-08) 科学社会学・科学技術史
Sociology / history of science
and technology 43 6 0.111 0.163 0.132
(l3-09) 文化財科学 Cultural assets study 55 1 0.200 0.036 0.062
(l3-10) 地理学 Geography 148 4 0.117 0.203 0.149
(l3-11) 環境学 Environmental science 2136 14 0.262 0.385 0.312
(l3-12) ナノ・マイクロ科学 Nano / micro science 1852 4 0.103 0.313 0.155
(l3-13) 社会・安全システム科学 Social / safety system science 868 14 0.187 0.214 0.199
(l3-14) ゲノム科学 Genome science 394 3 0.040 0.203 0.067
(l3-15) 生物分子科学 Biomedical engineering 875 2 0.119 0.325 0.174
(l3-16) 資源保全学 Culture assets and museology 172 3 0.181 0.145 0.161
(l3-17) 地域研究 Area studies 85 7 0.164 0.271 0.204
(l3-18) ジェンダー Gender 27 3 0.231 0.111 0.150
30
The third-level
67 disciplines
seq. no.
KAKENHI subject category Translation Cardinality No. of WoS subject
categories to get the
max pseudo F1-measure
Pseudo precision Pseudo recall Max pseudo F1
measure
(l3-19) 哲学 Philosophy 60 4 0.436 0.283 0.343
(l3-20) 芸術学 Art Studies 9 1 0.091 0.111 0.100
(l3-21) 文学 Literature 41 10 0.700 0.683 0.691
(l3-22) 言語学 Linguistics 239 3 0.705 0.410 0.519
(l3-23) 史学 History 82 6 0.412 0.341 0.373
(l3-24) 人文地理学 Human Geography 14 3 0.175 0.500 0.259
(l3-25) 文化人類学 Cultural Anthropology 38 3 0.056 0.105 0.073
(l3-26) 法学 Law 41 3 0.385 0.122 0.185
(l3-27) 政治学 Politics 59 2 0.409 0.458 0.432
(l3-28) 経済学 Economics 992 12 0.692 0.622 0.655
(l3-29) 経営学 Management 130 5 0.294 0.385 0.333
(l3-30) 社会学 Sociology 90 8 0.176 0.278 0.216
(l3-31) 心理学 Psychology 794 14 0.488 0.479 0.483
(l3-32) 教育学 Education 151 9 0.244 0.258 0.251
(l3-33) 数学 Mathematics 2589 4 0.734 0.792 0.762
(l3-34) 天文学 Astronomy 1005 1 0.505 0.870 0.639
(l3-35) 物理学 Physics 5199 6 0.498 0.651 0.565
(l3-36) 地球惑星科学 Earth and Planetary Science 2099 7 0.619 0.662 0.640
(l3-37) プラズマ科学 Plasma Science 508 1 0.233 0.191 0.210
(l3-38) 基礎化学 Basic Chemistry 2448 7 0.229 0.801 0.356
(l3-39) 複合化学 Applied Chemistry 3573 6 0.283 0.526 0.368
(l3-40) 材料化学 Materials Chemistry 1635 7 0.157 0.348 0.216
(l3-41) 応用物理学・工学基礎 Applied Physics 2235 5 0.170 0.394 0.238
(l3-42) 機械工学 Mechanical Engineering 2675 11 0.431 0.388 0.408
(l3-43) 電気電子工学
Electrical and Electric
Engineering 4875 10 0.338 0.669 0.449
(l3-44) 土木工学 Civil Engineering 711 8 0.371 0.484 0.420
(l3-45) 建築学
Architecture and Building
Engineering 170 3 0.286 0.506 0.365
31
Average
cardinality
Average no. of WoS
subject categories
Average pseudo
precision
Average pseudo
recall
Average pseudo
F1 measure
1450.4 6.1 0.315 0.367 0.317
The third-level
67 disciplines
seq. no.
KAKENHI subject category Translation Cardinality No. of WoS subject
categories to get the
max pseudo F1-measure
Pseudo precision Pseudo recall Max pseudo F1
measure
(l3-46) 材料工学 Material Engineering 2931 6 0.348 0.523 0.418
(l3-47) プロセス工学 Process/Chemical Engineering 1283 4 0.145 0.306 0.197
(l3-48) 総合工学 Integrated Engineering 1465 8 0.256 0.309 0.280
(l3-49) 基礎生物学 Basic Biology 2423 7 0.375 0.400 0.387
(l3-50) 生物科学 Biological Science 2679 4 0.167 0.582 0.259
(l3-51) 人類学 Anthropology 300 3 0.315 0.440 0.367
(l3-52) 農学 Plant Production and
Environmental Agriculture
899 4 0.307 0.449 0.365
(l3-53) 農芸化学 Agricultural Chemistry 1755 6 0.220 0.386 0.281
(l3-54) 林学 Forest and Forest Products
Science
559 5 0.408 0.252 0.312
(l3-55) 水産学 Applied Aquatic Science 581 2 0.419 0.327 0.367
(l3-56) 農業経済学 Agricultural Science in Society
and Economy
31 2 0.333 0.097 0.150
(l3-57) 農業工学 Agro-Engineering 216 4 0.157 0.259 0.195
(l3-58) 畜産学・獣医学 Animal Life Science 1190 4 0.511 0.387 0.440
(l3-59) 境界農学 Boundary Agriculture 541 4 0.235 0.148 0.181
(l3-60) 薬学 Pharmacy 3457 4 0.294 0.369 0.328
(l3-61) 基礎医学 Basic Medicine 5232 16 0.213 0.551 0.307
(l3-62) 境界医学 Boundary Medicine 850 12 0.162 0.112 0.132
(l3-63) 社会医学 Society Medicine 1065 8 0.282 0.262 0.271
Miscellaneous considerations
• Decision by an expert
• Limit the number of correspondence to 1 – 4 for 𝑂𝑖
1
.
• For every Web of Science subject category 𝑂𝑖
1
, the number of relations with KAKENHI
subject categories 𝑂𝑗
2
is limited to 4 at most.
• For every Web of Science subject category 𝑂𝑖
1
, when the recall rate exceeds a half, we
stop adding any more relation.
• Check all correspondence between 𝑂𝑖
1
and 𝑂𝑗
2
.
• Add or remove correspondence relations between them by means of subject
classification keywords.
32
http://help.prod-incites.com/inCites2Live/filterValuesGroup/researchAreaSchema/kaken/version/16
Inducing a correspondence between KAKENHI
subject classification scheme and WoS subject
classification scheme
10 areas of KAKENHI (a part of the mapping list) 67 disciplines of KAKENHI (a part of the mapping list)
33
Example screen of InCites™
34
WoS Documents: 58,395,008
for Web of Science subject categories
WoS Documents: 3,192,449
for Web of Science subject categories
limited with
“LOCATION = JAPAN”
WoS Documents: 3,191,448
for KAKEN L3 subject categories
limited with
“LOCATION = JAPAN”
(a snapshot of 2018-12-14)
The bubbles representing
proportional numbers of
articles classified using the
KAKENHI subject categories
Top 30 subject distribution of Japanese authors’
articles with the two subject classification
schemes
WoS subject classification scheme KAKENHI subject classification scheme
35
User feedback: Questions and answers on the
validity of the KAKENHI subject classification
scheme
• KAKEN classification scheme
• April 2016, released on InCites Benchmarking
• User survey
• March 2017 by online questionnaire for
institutional active users
• 18 questions
• Results
• 26 institutional users feedback
• Q7
• Which levels of hierarchy in KAKENHI subject
classification scheme do you need?
• Q11
• Do you feel comfortable with your analysis
results by KAKENHI subject classification scheme
in accordance with your experience?
User role in the institution Yes (multiple
answers possible)
RA (research administrator) 20
Administrator / officer 3
IR (institutional research) staff 5
Others 2
Other: 1, I need more detail categories
36
Discussion (1)
• Our approach, i.e. deciding a correspondence between two subject
classification schemes has an inherent limitation.
• In natural correlations between subject categories of two subject classification
schemes, each subject category of one scheme partly overlaps several subject
categories of the other scheme.
• There is no inclusion relationship between them.
• Correspondence relations are probabilistic.
• Research projects and journal articles have similarities and differences on
subject.
• Projects and articles have a strong correlation on subject.
• But, they also have differences on subject.
• Projects precede articles.
• Projects tend to indicate the central concept with essential keywords.
37
Discussion (2)
• Nevertheless, the classification results were accepted by InCites
users.
• Our approach requires less workload .
• The numbers of journal titles in Web of Science citation database is 24,688.
• The number of Web of Science documents of InCites is 58,395,008.
• The number of subject category pairs to decide a correspondence is 16,817.
• For KAKEN 67 - WoS 251, the number of the pairs is 16,817.
• For KAKEN 10 - WoS 251, the number of the pairs is 2,510.
• But, evidence data is not sufficient to automatic decision making.
• The sum of frequency counts of the contingency table is 97,175.
• Manual handling was needed.
38
Conclusions and future work
• Conclusions
• We proposed an approach to apply a new subject classification scheme for a bibliographic
database that is already classified by using a subject classification scheme.
• We gave a fundamental analytical model of subject classification scheme based on set theory.
• Compact topological space formation for a new subject classification scheme is a necessary condition.
• An external database, e.g. research project database is utilized to induce a correspondence between the
two subject classification schemes.
• We applied the approach to a practical example, InCites™ that is a research evaluation tool
based on the Web of Science citation database to add the subject classification scheme of
Japan’s largest national grants KAKENHI. The user survey indicates that users generally accept
the new function.
• Future work
• For a complex classification scheme such as a hierarchical classification scheme, our
approach should be extended to be applied to its character.
• Alternatively, multilabel learning is another possible method to aim at our goal. We need to
compare it to our method.
39
Acknowledgments
• This presentation is a result of a joint research between National
Institute of Informatics and Clarivate Analytics, Co., Ltd. As for the
databases we used in this presentation, the KAKEN database is
provided by National Institute of Informatics, Cyber Science
Infrastructure Development Department, Scholarly and Academic
Information Division, and the Web of Science citation database is
provided by Clarivate Analytics, Co., Ltd. We are thankful to the
organizations who let us use the valuable assets.
40

More Related Content

What's hot

Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Designaimsnist
 
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval IJECEIAES
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML modelaimsnist
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...IJECEIAES
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningHouw Liong The
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478IJRAT
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 
Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics OverviewTony Fast
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesIRJET Journal
 
Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Editor IJARCET
 

What's hot (18)

Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
 
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
PggLas12
PggLas12PggLas12
PggLas12
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
PhD defence
PhD defencePhD defence
PhD defence
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics Overview
 
Layering Based Network Intrusion Detection System to Enhance Network Attacks ...
Layering Based Network Intrusion Detection System to Enhance Network Attacks ...Layering Based Network Intrusion Detection System to Enhance Network Attacks ...
Layering Based Network Intrusion Detection System to Enhance Network Attacks ...
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
 
Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932
 

Similar to Application of a Novel Subject Classification Scheme for a Bibliographic Database Using a Data-Driven Correspondence

Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...National Institute of Informatics
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Giannis Tsakonas
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Postermultimediaeval
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
 
As we may link: a model to support aggregated scientific knowledge
As we may link: a model to support aggregated scientific knowledgeAs we may link: a model to support aggregated scientific knowledge
As we may link: a model to support aggregated scientific knowledgePrashant Gupta
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewAngelo Salatino
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Richard Zijdeman
 
Chapter1_C.doc
Chapter1_C.docChapter1_C.doc
Chapter1_C.docbutest
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...Advanced-Concepts-Team
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
 
E bank uk_linking_research_data_scholarly
E bank uk_linking_research_data_scholarlyE bank uk_linking_research_data_scholarly
E bank uk_linking_research_data_scholarlyLuisa Francisco
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
 

Similar to Application of a Novel Subject Classification Scheme for a Bibliographic Database Using a Data-Driven Correspondence (20)

Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
As we may link: a model to support aggregated scientific knowledge
As we may link: a model to support aggregated scientific knowledgeAs we may link: a model to support aggregated scientific knowledge
As we may link: a model to support aggregated scientific knowledge
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
 
Chapter1_C.doc
Chapter1_C.docChapter1_C.doc
Chapter1_C.doc
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Scientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked DataScientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked Data
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
E bank uk_linking_research_data_scholarly
E bank uk_linking_research_data_scholarlyE bank uk_linking_research_data_scholarly
E bank uk_linking_research_data_scholarly
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 

More from National Institute of Informatics

Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloudNational Institute of Informatics
 
Making data typing efforts or automatically detecting data types for automat...
Making data typing efforts or automatically detecting data types  for automat...Making data typing efforts or automatically detecting data types  for automat...
Making data typing efforts or automatically detecting data types for automat...National Institute of Informatics
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...National Institute of Informatics
 
Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksEmerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksNational Institute of Informatics
 
テンソル分解の著者名寄せへの応用と潜在変数を持つモデルとの比較
テンソル分解の著者名寄せへの応用と潜在変数を持つモデルとの比較テンソル分解の著者名寄せへの応用と潜在変数を持つモデルとの比較
テンソル分解の著者名寄せへの応用と潜在変数を持つモデルとの比較National Institute of Informatics
 
離散一般化ベータ分布を仮定した研究分野マッピングの導出
離散一般化ベータ分布を仮定した研究分野マッピングの導出離散一般化ベータ分布を仮定した研究分野マッピングの導出
離散一般化ベータ分布を仮定した研究分野マッピングの導出National Institute of Informatics
 
レコードリンケージに基づく科研費分野-WoS分野マッピングの導出
レコードリンケージに基づく科研費分野-WoS分野マッピングの導出レコードリンケージに基づく科研費分野-WoS分野マッピングの導出
レコードリンケージに基づく科研費分野-WoS分野マッピングの導出National Institute of Informatics
 
レコードリンケージに基づく科研費分野-WoS分野マッピング
レコードリンケージに基づく科研費分野-WoS分野マッピングレコードリンケージに基づく科研費分野-WoS分野マッピング
レコードリンケージに基づく科研費分野-WoS分野マッピングNational Institute of Informatics
 
科研費分野-トピック分類マトリックスへの主成分分析の適用
科研費分野-トピック分類マトリックスへの主成分分析の適用科研費分野-トピック分類マトリックスへの主成分分析の適用
科研費分野-トピック分類マトリックスへの主成分分析の適用National Institute of Informatics
 
学術情報流通のための識別子とメタデータDBを対象とした融合研究シーズ探索 - 超高層物理学分野における観測データを例として -
学術情報流通のための識別子とメタデータDBを対象とした融合研究シーズ探索 - 超高層物理学分野における観測データを例として -学術情報流通のための識別子とメタデータDBを対象とした融合研究シーズ探索 - 超高層物理学分野における観測データを例として -
学術情報流通のための識別子とメタデータDBを対象とした融合研究シーズ探索 - 超高層物理学分野における観測データを例として -National Institute of Informatics
 
機械学習を用いたWeb上の産学連携関連文書の抽出
機械学習を用いたWeb上の産学連携関連文書の抽出機械学習を用いたWeb上の産学連携関連文書の抽出
機械学習を用いたWeb上の産学連携関連文書の抽出National Institute of Informatics
 
科研費データベースの分野分類とトピック分類の比較分析
科研費データベースの分野分類とトピック分類の比較分析科研費データベースの分野分類とトピック分類の比較分析
科研費データベースの分野分類とトピック分類の比較分析National Institute of Informatics
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...National Institute of Informatics
 
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...National Institute of Informatics
 
著者の同定・識別について- JAIRO著者名検索プロジェクトへ -
著者の同定・識別について- JAIRO著者名検索プロジェクトへ -著者の同定・識別について- JAIRO著者名検索プロジェクトへ -
著者の同定・識別について- JAIRO著者名検索プロジェクトへ -National Institute of Informatics
 
1.研究者リゾルバーとJAIRO著者名検索、2.KAKENデータベースの機能拡張
1.研究者リゾルバーとJAIRO著者名検索、2.KAKENデータベースの機能拡張1.研究者リゾルバーとJAIRO著者名検索、2.KAKENデータベースの機能拡張
1.研究者リゾルバーとJAIRO著者名検索、2.KAKENデータベースの機能拡張National Institute of Informatics
 
なぜ研究者の名寄せが必要か ~ 世界の動向と研究者リゾルバー ~
なぜ研究者の名寄せが必要か ~ 世界の動向と研究者リゾルバー ~なぜ研究者の名寄せが必要か ~ 世界の動向と研究者リゾルバー ~
なぜ研究者の名寄せが必要か ~ 世界の動向と研究者リゾルバー ~National Institute of Informatics
 
ORCIDのプロトタイプシステムと著者ID関連技術の動向
ORCIDのプロトタイプシステムと著者ID関連技術の動向ORCIDのプロトタイプシステムと著者ID関連技術の動向
ORCIDのプロトタイプシステムと著者ID関連技術の動向National Institute of Informatics
 

More from National Institute of Informatics (19)

Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloud
 
Making data typing efforts or automatically detecting data types for automat...
Making data typing efforts or automatically detecting data types  for automat...Making data typing efforts or automatically detecting data types  for automat...
Making data typing efforts or automatically detecting data types for automat...
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...
 
Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksEmerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networks
 
テンソル分解の著者名寄せへの応用と潜在変数を持つモデルとの比較
テンソル分解の著者名寄せへの応用と潜在変数を持つモデルとの比較テンソル分解の著者名寄せへの応用と潜在変数を持つモデルとの比較
テンソル分解の著者名寄せへの応用と潜在変数を持つモデルとの比較
 
研究者識別子の重要性とORCIDアップデート
研究者識別子の重要性とORCIDアップデート研究者識別子の重要性とORCIDアップデート
研究者識別子の重要性とORCIDアップデート
 
離散一般化ベータ分布を仮定した研究分野マッピングの導出
離散一般化ベータ分布を仮定した研究分野マッピングの導出離散一般化ベータ分布を仮定した研究分野マッピングの導出
離散一般化ベータ分布を仮定した研究分野マッピングの導出
 
レコードリンケージに基づく科研費分野-WoS分野マッピングの導出
レコードリンケージに基づく科研費分野-WoS分野マッピングの導出レコードリンケージに基づく科研費分野-WoS分野マッピングの導出
レコードリンケージに基づく科研費分野-WoS分野マッピングの導出
 
レコードリンケージに基づく科研費分野-WoS分野マッピング
レコードリンケージに基づく科研費分野-WoS分野マッピングレコードリンケージに基づく科研費分野-WoS分野マッピング
レコードリンケージに基づく科研費分野-WoS分野マッピング
 
科研費分野-トピック分類マトリックスへの主成分分析の適用
科研費分野-トピック分類マトリックスへの主成分分析の適用科研費分野-トピック分類マトリックスへの主成分分析の適用
科研費分野-トピック分類マトリックスへの主成分分析の適用
 
学術情報流通のための識別子とメタデータDBを対象とした融合研究シーズ探索 - 超高層物理学分野における観測データを例として -
学術情報流通のための識別子とメタデータDBを対象とした融合研究シーズ探索 - 超高層物理学分野における観測データを例として -学術情報流通のための識別子とメタデータDBを対象とした融合研究シーズ探索 - 超高層物理学分野における観測データを例として -
学術情報流通のための識別子とメタデータDBを対象とした融合研究シーズ探索 - 超高層物理学分野における観測データを例として -
 
機械学習を用いたWeb上の産学連携関連文書の抽出
機械学習を用いたWeb上の産学連携関連文書の抽出機械学習を用いたWeb上の産学連携関連文書の抽出
機械学習を用いたWeb上の産学連携関連文書の抽出
 
科研費データベースの分野分類とトピック分類の比較分析
科研費データベースの分野分類とトピック分類の比較分析科研費データベースの分野分類とトピック分類の比較分析
科研費データベースの分野分類とトピック分類の比較分析
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
 
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
 
著者の同定・識別について- JAIRO著者名検索プロジェクトへ -
著者の同定・識別について- JAIRO著者名検索プロジェクトへ -著者の同定・識別について- JAIRO著者名検索プロジェクトへ -
著者の同定・識別について- JAIRO著者名検索プロジェクトへ -
 
1.研究者リゾルバーとJAIRO著者名検索、2.KAKENデータベースの機能拡張
1.研究者リゾルバーとJAIRO著者名検索、2.KAKENデータベースの機能拡張1.研究者リゾルバーとJAIRO著者名検索、2.KAKENデータベースの機能拡張
1.研究者リゾルバーとJAIRO著者名検索、2.KAKENデータベースの機能拡張
 
なぜ研究者の名寄せが必要か ~ 世界の動向と研究者リゾルバー ~
なぜ研究者の名寄せが必要か ~ 世界の動向と研究者リゾルバー ~なぜ研究者の名寄せが必要か ~ 世界の動向と研究者リゾルバー ~
なぜ研究者の名寄せが必要か ~ 世界の動向と研究者リゾルバー ~
 
ORCIDのプロトタイプシステムと著者ID関連技術の動向
ORCIDのプロトタイプシステムと著者ID関連技術の動向ORCIDのプロトタイプシステムと著者ID関連技術の動向
ORCIDのプロトタイプシステムと著者ID関連技術の動向
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 

Recently uploaded (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 

Application of a Novel Subject Classification Scheme for a Bibliographic Database Using a Data-Driven Correspondence

  • 1. Application of a Novel Subject Classification Scheme for a Bibliographic Database Using a Data-Driven Correspondence Kei Kurakawa, Yuan Sun National Institute of Informatics, Japan Satoko Ando Clarivate Analytics (Japan) Co., Ltd. This is the presentation slides for the workshop BigScholar 2019 in conjunction with CIKM 2019 (ACM International Conference on Information and Knowledge Management) Nov 7, 2019, at CNCC, Beijing, China. Citation: Kurakawa K, Sun Y and Ando S (2020) Application of a Novel Subject Classification Scheme for a Bibliographic Database Using a Data-Driven Correspondence. Front. Big Data 2:48. doi: 10.3389/fdata.2019.00048
  • 2. Overview • Introduction • Motivation • Applying a new subject classification scheme for a subject-classified bibliographic database • Our main contributions • Related work • Theoretical background • Subject classification model of the bibliographic database based on set theory • Main steps of our data-driven approach • Case study • Applying the Japanese grants KAKENHI subject classification scheme for the Web of Science citation database • Conclusions and future work 2
  • 3. Motivation • In assessing research activities based on bibliometrics, analysts are accustomed to use the major citation database Web of Science whose subject classification schemes, i.e. WoS Subject Category, ESI, and GIPP are prepared for qualitative analysis. • Analysts need domestic subject classification schemes for their analysis, which are not implemented on the database. • Applying a new classification scheme for the database by hand is too much labor intensive and time consuming task. • How can we apply a new classification scheme for the database, efficiently and effectively? 3
  • 4. Our main contributions • We propose an approach to apply a novel subject classification scheme for a subject-classified database using a data-driven correspondence between the new and present ones, which is accustomed to digital libraries. • We give a fundamental analytical model of subject classification scheme based on set theory and describe compact topological space formation for a new subject classification scheme as a necessary condition. • We demonstrate the effectiveness and efficiency of our approach to a practical bibliographic database. 4
  • 5. Related work • In the field of computer science, • Information retrieval • Data mining • Digital libraries • Automated text categorization • Classification (supervised learning) • Naïve bays classification • Neural networks • Support vector machines • Clustering (unsupervised learning) • K-means • Expectation maximization (EM) • Hierarchical agglomerative clustering • Divisive clustering • Matrix decompositions • More problem specific method • Multi-label classification / multi-label learning, based on • SVM • Deep learning • Ensemble classification. • Extreme multi-label classification, based on • Graph embedding • Convolutional neural network (CNN) • Attention model of neural networks • Label hierarchy considered • A method of mapping between different classification schemes • Importing cataloguing records using a different classification scheme in digital libraries • Information integration on the Web 5
  • 6. Theoretical background • Subject classification model of the bibliographic database • Compact topological space formation for a new subject classification scheme • Inducing a correspondence between two subject classification schemes using a research project database 6
  • 7. Given a bibliographic database 𝑆 7
  • 9. Given two categories 𝑂1 𝑂2 𝑆 Analytical basis 9
  • 11. Given many categories 𝑆 𝔒(1) = {𝑂𝑖} induces Analytical basis Topology 𝑆, 𝔒 1 𝔒 1 = {any unions of analytical basis}11
  • 12. Given a finite cover 𝑆 Compact topological space12 𝔒(1) = {𝑂𝑖} 𝑆, 𝔒 1
  • 13. Another set of categories 𝑆 Compact topological space13 𝔒(2) = {𝑂𝑖} 𝑆, 𝔒 2
  • 14. If we have an external database such as … • Research project database 𝑆′ 𝑆 𝑇 𝑏 𝑂 ℎ articles projects Compact topological space 𝑇, 𝔒 𝑇 2 14
  • 15. If we have an external database such as … • Research project database 𝑆′ 𝑆 𝑇 𝑏 𝑂 ℎ articles projects Compact topological space 𝑇, 𝔒 𝑇 2 Compact topological space 𝑆′ , 𝔒 𝑆′ 2 We can define a compact topological space for the second set of categories. 15
  • 16. Compact topological spaces for the two subject classification schemes 𝑆′ 𝔒 1 = {𝑂𝑖 1 } Compact topological space 𝑆′ , 𝔒 𝑆′ 2 𝑆′ , 𝔒 𝑆′ 1 𝔒 2 = {𝑂𝑖 2 } 16
  • 17. Correspondence between two subject classification schemes 𝑂𝑗 2 𝑂1 1 𝑂2 1 𝑂3 1 𝑂4 1 𝑂5 1 𝑂6 1𝑂7 1 17
  • 18. Metrics for inducing a correspondence Maximizing 𝐹𝛽-measure. 𝑑 𝑝𝑗 = 𝑖∈𝐼 𝑗 1 𝑂 𝑗 2 ∩𝑂𝑖 1 𝑖∈𝐼 𝑗 1 𝑂𝑖 1 (precision), 𝑑 𝑟𝑗 = 𝑖∈𝐼 𝑗 1 𝑂 𝑗 2 ∩𝑂𝑖 1 𝑂 𝑗 2 (recall), 𝑑 𝑓𝑗 = 1+𝛽2 𝑑 𝑝𝑗 𝑑 𝑟𝑗 𝛽2 𝑑 𝑝𝑗+𝑑 𝑟𝑗 , 𝛽 > 0, (𝐹𝛽-measure). 18
  • 19. In practical cases, we often build a contingency table for the two subject classification schemes 𝑂1 1 ⋮ 𝑂𝑖 1 ⋮ 𝑂 𝑚 1 𝑂1 2 ⋯ 𝑂𝑗 2 ⋯ 𝑂 𝑛 2 𝑓11 ⋯ 𝑓1𝑗 ⋯ 𝑓1𝑛 ⋮ 𝑓𝑖1 ⋮ ⋱ ⋮ ⋯ 𝑓𝑖𝑗 ⋯ ⋮ ⋱ ⋮ 𝑓𝑖𝑛 ⋮ 𝑓 𝑚1 ⋯ 𝑓 𝑚𝑗 ⋯ 𝑓𝑚𝑛 The first set of subject categories The second set of subject categories 𝑓𝑖𝑗 = 𝑂𝑖 1 ∩ 𝑂𝑗 2 19
  • 20. Pseudo metrics for inducing a correspondence Maximizing pseudo 𝐹𝛽-measure. 𝑑 𝑝𝑗 ′ = 𝑖 𝑂 𝑗 2 ∩𝑂𝑖 1 𝑖 𝑂𝑖 1 (pseudo precision), 𝑑 𝑟𝑗 ′ = 𝑖 𝑂 𝑗 2 ∩𝑂𝑖 1 𝑂 𝑗 2 (pseudo recall), 𝑑 𝑓𝑗 ′ = 1+𝛽2 𝑑 𝑝𝑗 ′ 𝑑 𝑟𝑗 ′ 𝛽2 𝑑 𝑝𝑗 ′ +𝑑 𝑟𝑗 ′ , 𝛽 > 0, (pseudo 𝐹𝛽-measure). 20
  • 21. Main steps of our approach 1’-2. Inducing a correspondence between the two subject classification schemes by using pseudo 𝐹𝛽-measure 1’-1. Constructing a contingency table between two subject classification schemes 2. Revising the correspondence to guarantee the existence of a finite cover of the novel subject classification scheme 1. Inducing a correspondence between the two subject classification schemes by using 𝐹𝛽-measure 21
  • 22. Case study • InCites™ (Clarivate Analytics) • A world class research evaluation platform • Web of Science™ citation database • Web of Science classification scheme (251 categories) • Essential Science Indicator(ESI) classification scheme (22 categories) • Japanese users are eager to utilize the subject classification scheme of Japan’s largest national research grants KAKENHI. • KAKEN (NII) research project database • Archival records of research projects and the outputs of KAKENHI grants in Japan. • KAKENHI subject classification scheme (hierarchical classification scheme; 4 categories, 10 areas, 67 disciplines, and 284 research fields) 22
  • 25. Developing a contingency table as evidence data • We identified the same bibliographic records in the WoS citation database as of 2009 and 2010 through a set of record linkage techniques to obtain a set of articles 𝑆′ that are classified using both the KAKENHI and WoS classification schemes. 𝑆′ 𝑆𝑇 𝑏 𝑂 ℎ Web of ScienceKAKEN 𝑎1 𝑎2 𝑎3 𝑎1 ′ 𝑎2 ′ 𝑎3 ′ articles articlesprojects Bibliographic linkage 𝑎1 ′ ≡ 𝑎1 𝑎2 ′ ≡ 𝑎2 𝑎3 ′ ≡ 𝑎3 25
  • 26. A contingency table between WoS and KAKENHI subject classification schemes a part of 251 WoS categories x 67 KAKENHI areas 26
  • 27. Analysis of the contingency table 27 ,where is the rank value, its maximum value, a normalized constant and two fitting components. The discrete generalized beta distribution (DGBD)
  • 29. Maximum pseudo 𝐹1-measure for the third-level 67 disciplines of the KAKENHI subject categories against the 251 WoS subject categories 29 The third-level 67 disciplines seq. no. KAKENHI subject category Translation Cardinality No. of WoS subject categories to get the max pseudo F1-measure Pseudo precision Pseudo recall Max pseudo F1 measure (l3-01) 情報学 Informatics 6637 17 0.576 0.626 0.600 (l3-02) 神経科学 Brain sciences 1570 1 0.218 0.365 0.273 (l3-03) 実験動物学 Laboratory animal science 242 1 0.059 0.074 0.066 (l3-04) 人間医工学 Human informatics 1995 8 0.222 0.213 0.217 (l3-05) 健康・スポーツ科学 Health / sports science 844 5 0.181 0.290 0.223 (l3-06) 生活科学 Human life science 467 4 0.239 0.281 0.258 (l3-07) 科学教育・教育工学 Science education /educational technology 388 2 0.377 0.103 0.162 (l3-08) 科学社会学・科学技術史 Sociology / history of science and technology 43 6 0.111 0.163 0.132 (l3-09) 文化財科学 Cultural assets study 55 1 0.200 0.036 0.062 (l3-10) 地理学 Geography 148 4 0.117 0.203 0.149 (l3-11) 環境学 Environmental science 2136 14 0.262 0.385 0.312 (l3-12) ナノ・マイクロ科学 Nano / micro science 1852 4 0.103 0.313 0.155 (l3-13) 社会・安全システム科学 Social / safety system science 868 14 0.187 0.214 0.199 (l3-14) ゲノム科学 Genome science 394 3 0.040 0.203 0.067 (l3-15) 生物分子科学 Biomedical engineering 875 2 0.119 0.325 0.174 (l3-16) 資源保全学 Culture assets and museology 172 3 0.181 0.145 0.161 (l3-17) 地域研究 Area studies 85 7 0.164 0.271 0.204 (l3-18) ジェンダー Gender 27 3 0.231 0.111 0.150
  • 30. 30 The third-level 67 disciplines seq. no. KAKENHI subject category Translation Cardinality No. of WoS subject categories to get the max pseudo F1-measure Pseudo precision Pseudo recall Max pseudo F1 measure (l3-19) 哲学 Philosophy 60 4 0.436 0.283 0.343 (l3-20) 芸術学 Art Studies 9 1 0.091 0.111 0.100 (l3-21) 文学 Literature 41 10 0.700 0.683 0.691 (l3-22) 言語学 Linguistics 239 3 0.705 0.410 0.519 (l3-23) 史学 History 82 6 0.412 0.341 0.373 (l3-24) 人文地理学 Human Geography 14 3 0.175 0.500 0.259 (l3-25) 文化人類学 Cultural Anthropology 38 3 0.056 0.105 0.073 (l3-26) 法学 Law 41 3 0.385 0.122 0.185 (l3-27) 政治学 Politics 59 2 0.409 0.458 0.432 (l3-28) 経済学 Economics 992 12 0.692 0.622 0.655 (l3-29) 経営学 Management 130 5 0.294 0.385 0.333 (l3-30) 社会学 Sociology 90 8 0.176 0.278 0.216 (l3-31) 心理学 Psychology 794 14 0.488 0.479 0.483 (l3-32) 教育学 Education 151 9 0.244 0.258 0.251 (l3-33) 数学 Mathematics 2589 4 0.734 0.792 0.762 (l3-34) 天文学 Astronomy 1005 1 0.505 0.870 0.639 (l3-35) 物理学 Physics 5199 6 0.498 0.651 0.565 (l3-36) 地球惑星科学 Earth and Planetary Science 2099 7 0.619 0.662 0.640 (l3-37) プラズマ科学 Plasma Science 508 1 0.233 0.191 0.210 (l3-38) 基礎化学 Basic Chemistry 2448 7 0.229 0.801 0.356 (l3-39) 複合化学 Applied Chemistry 3573 6 0.283 0.526 0.368 (l3-40) 材料化学 Materials Chemistry 1635 7 0.157 0.348 0.216 (l3-41) 応用物理学・工学基礎 Applied Physics 2235 5 0.170 0.394 0.238 (l3-42) 機械工学 Mechanical Engineering 2675 11 0.431 0.388 0.408 (l3-43) 電気電子工学 Electrical and Electric Engineering 4875 10 0.338 0.669 0.449 (l3-44) 土木工学 Civil Engineering 711 8 0.371 0.484 0.420 (l3-45) 建築学 Architecture and Building Engineering 170 3 0.286 0.506 0.365
  • 31. 31 Average cardinality Average no. of WoS subject categories Average pseudo precision Average pseudo recall Average pseudo F1 measure 1450.4 6.1 0.315 0.367 0.317 The third-level 67 disciplines seq. no. KAKENHI subject category Translation Cardinality No. of WoS subject categories to get the max pseudo F1-measure Pseudo precision Pseudo recall Max pseudo F1 measure (l3-46) 材料工学 Material Engineering 2931 6 0.348 0.523 0.418 (l3-47) プロセス工学 Process/Chemical Engineering 1283 4 0.145 0.306 0.197 (l3-48) 総合工学 Integrated Engineering 1465 8 0.256 0.309 0.280 (l3-49) 基礎生物学 Basic Biology 2423 7 0.375 0.400 0.387 (l3-50) 生物科学 Biological Science 2679 4 0.167 0.582 0.259 (l3-51) 人類学 Anthropology 300 3 0.315 0.440 0.367 (l3-52) 農学 Plant Production and Environmental Agriculture 899 4 0.307 0.449 0.365 (l3-53) 農芸化学 Agricultural Chemistry 1755 6 0.220 0.386 0.281 (l3-54) 林学 Forest and Forest Products Science 559 5 0.408 0.252 0.312 (l3-55) 水産学 Applied Aquatic Science 581 2 0.419 0.327 0.367 (l3-56) 農業経済学 Agricultural Science in Society and Economy 31 2 0.333 0.097 0.150 (l3-57) 農業工学 Agro-Engineering 216 4 0.157 0.259 0.195 (l3-58) 畜産学・獣医学 Animal Life Science 1190 4 0.511 0.387 0.440 (l3-59) 境界農学 Boundary Agriculture 541 4 0.235 0.148 0.181 (l3-60) 薬学 Pharmacy 3457 4 0.294 0.369 0.328 (l3-61) 基礎医学 Basic Medicine 5232 16 0.213 0.551 0.307 (l3-62) 境界医学 Boundary Medicine 850 12 0.162 0.112 0.132 (l3-63) 社会医学 Society Medicine 1065 8 0.282 0.262 0.271
  • 32. Miscellaneous considerations • Decision by an expert • Limit the number of correspondence to 1 – 4 for 𝑂𝑖 1 . • For every Web of Science subject category 𝑂𝑖 1 , the number of relations with KAKENHI subject categories 𝑂𝑗 2 is limited to 4 at most. • For every Web of Science subject category 𝑂𝑖 1 , when the recall rate exceeds a half, we stop adding any more relation. • Check all correspondence between 𝑂𝑖 1 and 𝑂𝑗 2 . • Add or remove correspondence relations between them by means of subject classification keywords. 32
  • 33. http://help.prod-incites.com/inCites2Live/filterValuesGroup/researchAreaSchema/kaken/version/16 Inducing a correspondence between KAKENHI subject classification scheme and WoS subject classification scheme 10 areas of KAKENHI (a part of the mapping list) 67 disciplines of KAKENHI (a part of the mapping list) 33
  • 34. Example screen of InCites™ 34 WoS Documents: 58,395,008 for Web of Science subject categories WoS Documents: 3,192,449 for Web of Science subject categories limited with “LOCATION = JAPAN” WoS Documents: 3,191,448 for KAKEN L3 subject categories limited with “LOCATION = JAPAN” (a snapshot of 2018-12-14) The bubbles representing proportional numbers of articles classified using the KAKENHI subject categories
  • 35. Top 30 subject distribution of Japanese authors’ articles with the two subject classification schemes WoS subject classification scheme KAKENHI subject classification scheme 35
  • 36. User feedback: Questions and answers on the validity of the KAKENHI subject classification scheme • KAKEN classification scheme • April 2016, released on InCites Benchmarking • User survey • March 2017 by online questionnaire for institutional active users • 18 questions • Results • 26 institutional users feedback • Q7 • Which levels of hierarchy in KAKENHI subject classification scheme do you need? • Q11 • Do you feel comfortable with your analysis results by KAKENHI subject classification scheme in accordance with your experience? User role in the institution Yes (multiple answers possible) RA (research administrator) 20 Administrator / officer 3 IR (institutional research) staff 5 Others 2 Other: 1, I need more detail categories 36
  • 37. Discussion (1) • Our approach, i.e. deciding a correspondence between two subject classification schemes has an inherent limitation. • In natural correlations between subject categories of two subject classification schemes, each subject category of one scheme partly overlaps several subject categories of the other scheme. • There is no inclusion relationship between them. • Correspondence relations are probabilistic. • Research projects and journal articles have similarities and differences on subject. • Projects and articles have a strong correlation on subject. • But, they also have differences on subject. • Projects precede articles. • Projects tend to indicate the central concept with essential keywords. 37
  • 38. Discussion (2) • Nevertheless, the classification results were accepted by InCites users. • Our approach requires less workload . • The numbers of journal titles in Web of Science citation database is 24,688. • The number of Web of Science documents of InCites is 58,395,008. • The number of subject category pairs to decide a correspondence is 16,817. • For KAKEN 67 - WoS 251, the number of the pairs is 16,817. • For KAKEN 10 - WoS 251, the number of the pairs is 2,510. • But, evidence data is not sufficient to automatic decision making. • The sum of frequency counts of the contingency table is 97,175. • Manual handling was needed. 38
  • 39. Conclusions and future work • Conclusions • We proposed an approach to apply a new subject classification scheme for a bibliographic database that is already classified by using a subject classification scheme. • We gave a fundamental analytical model of subject classification scheme based on set theory. • Compact topological space formation for a new subject classification scheme is a necessary condition. • An external database, e.g. research project database is utilized to induce a correspondence between the two subject classification schemes. • We applied the approach to a practical example, InCites™ that is a research evaluation tool based on the Web of Science citation database to add the subject classification scheme of Japan’s largest national grants KAKENHI. The user survey indicates that users generally accept the new function. • Future work • For a complex classification scheme such as a hierarchical classification scheme, our approach should be extended to be applied to its character. • Alternatively, multilabel learning is another possible method to aim at our goal. We need to compare it to our method. 39
  • 40. Acknowledgments • This presentation is a result of a joint research between National Institute of Informatics and Clarivate Analytics, Co., Ltd. As for the databases we used in this presentation, the KAKEN database is provided by National Institute of Informatics, Cyber Science Infrastructure Development Department, Scholarly and Academic Information Division, and the Web of Science citation database is provided by Clarivate Analytics, Co., Ltd. We are thankful to the organizations who let us use the valuable assets. 40

Editor's Notes

  1. intersection, set difference, union,
  2. Another set of categories is unknown for the set S. We want to specify the compact topological space for S.
  3. As a data-driven approach, …
  4. Given a research project database, we can observe compact topological spaces for the two subject classification schemes.
  5. Strategic position to induce a correspondence is to maximize the F-measure.
  6. The pseudo metrics are different from the original metrics because of subadditivity.
  7. As a summary, …