SlideShare a Scribd company logo
1 of 39
Download to read offline
Research project MAI 2
Final presentation - Group n. 4
1
S. Deckers - J. Hermans - A. Ludermann - D. Di Mitri - J. Rutten - D. Soemers
Research Project MAI 2 - Group n.4
● Our data
● Visualisation
● Analysing keywords
● Ontology
● Cluster articles
● Predicting citations
● Analyzing raw material
● Conclusion
● Improvements
Outline
2
Research Project MAI 2 - Group n.4
Our data
3
Research Project MAI 2 - Group n.4
Our data
4
Research Project MAI 2 - Group n.4
Visualisations
Task 5: “Visualising the articles in a relevant context of time and
geographical location in 2D or 3D”
5
Research Project MAI 2 - Group n.4
Task 5 - Results
6
Research Project MAI 2 - Group n.4
Analysing Keywords
Task 1: “Determining combinations of keywords that are specific
for each year, country, journal, and subject category”
Task 6: “Extracting (combinations of) keywords from abstracts
and titles”
7
Research Project MAI 2 - Group n.4
Task 1
• TF-IDF as feature extraction method.
• Treat objects of interest and their keywords as
documents.
• Extract relevant keywords by making use of a
threshold.
• Fast
Fetching
model
Generic
document
processor
Combination
model
8
Research Project MAI 2 - Group n.4
Task 6
1. Preprocessing of abstracts (tokenization,
stemming, removal of stopwords to reduce
dimensionality).
2. Construct vector space and word mapping for
every article abstract.
LDA (treat sentences as documents).
TF-IDF seemed too naïve.
3. Apply LDA (k = 1) on vector space to fetch
distribution over words.
4. Use wordmapping (index -> word), to extract
relevant words.
9
Research Project MAI 2 - Group n.4
Ontology
Task 2: “Specifying an application independent ontology of
publications.”
Task 7: “Defining ontology of the domain of nanotechnology
which should be linked to the ontology of publications made in
the first block.”
Task 8: “Automatically generating ontology for the publication
data. Compare this ontology with the one you defined
yourselves. Fill the ontology with data from the articles.”
10
Research Project MAI 2 - Group n.4
Task 2: Result
11
Research Project MAI 2 - Group n.4
Task 7: Result
12
Research Project MAI 2 - Group n.4
• Ontology Learning
• Automatic or semi-automatic creation of ontologies
• Requires text (or other data)
• Often requires human supervision / corrections
• Approach
• Accept single words from user input
• Allow choice of different senses of word
• Automatically generate related words
Task 8: Approach
13
Research Project MAI 2 - Group n.4
Cluster Articles
Task 4: “Learning article dendrograms and interpreting the
dendrogram clusters”
• Approach
• Analysing splitting at the root
14
Research Project MAI 2 - Group n.4
• Sample 8,000 articles from database
• Top-down hierarchical clustering
• K-Means on each level with K = 2
• Stop splitting when cluster small enough or dense
enough
• Repeat N times and compare results
Task 4: Approach
15
Research Project MAI 2 - Group n.4 16
Task 4: Dendrogram
Research Project MAI 2 - Group n.4
• Year Features
• 1998, 1999, 2000, 2001, 2002 (all 4x)
• Country Features
• USA (4x), Japan (2x), Germany (1x),
Peoples R. China (1x)
• Subject Features
• Physics, Condensed Matter (4x)
• Physics, Applied (3x)
• Chemistry, Physical (3x)
• Materials Science, Multidisciplinary (2x)
Task 4: Analysing split at root
17
Research Project MAI 2 - Group n.4
Predicting citations
Task 3: “Learning models that predict the citations of articles.”
Task 9: “Predicting the most cited authors.”
k-Nearest Neighbor classification
18
Research Project MAI 2 - Group n.4
• k-Nearest Neighbor, with k = 1 (results are sufficient)
• Considered attributes:
• Cited patents
• Publication year
• Countries
• Subject category
• Author affiliation origin
• Instance representation using a boolean array
• Cosine similarity
Initial approach (1)
19
Research Project MAI 2 - Group n.4
Classification using four classes:
• 0: no citations
• 1-20: low number of citations
• 21-100: medium number of citations
• 101 and more: high number of citations
Initial approach (2)
20
Research Project MAI 2 - Group n.4
Problem!
21
• 189,508 data instances (valid data entries)
• ~ 14,000 dimensional space
• Bool eq. to byte (smallest addressable memory elem.)
~14 kB for every instance!
~2.7 GB to contain complete dataset!
Research Project MAI 2 - Group n.4
Solution
22
• Use the boolean nature of the instance representation!
• Address and modify bit’s using bitmasks.
~14 kB reduced to ~1.7 kB
~2.7 GB reduced to ~332 MB
Memory consumption reduced by a factor of 8.
Research Project MAI 2 - Group n.4
Additional optimizations
23
Bit representation allows us to make more efficient use of
the CPU’s ALU.
Optimization of Cosine Similarity.
Increase in classification performance using linear search.
Original BitSet implementation
Research Project MAI 2 - Group n.4
• 10-fold-cross validation
• Avg. accuracy Class 0 : 0.7908
• Avg. accuracy Class 1 : 0.9943
• Avg. accuracy Class 2 : 0.9823
• Avg. accuracy Class 3 : 0.8175
• Total avg. accuracy: 0.8963
Task 3 - Results
24
Research Project MAI 2 - Group n.4
Represent author by his / her articles (instances) since
author cannot be uniquely identified.
Task 9 - Results
25
Search for Class 3 instances.
Avg. accuracy for Class 3 classification: 0.7377
Research Project MAI 2 - Group n.4
Analysing raw materials
Task 10: “Determining new substitutes of expensive raw
materials”
26
Research Project MAI 2 - Group n.4
Task 10: Determining new substitutes of rare
raw materials
● Rare earth elements
○ group of 17 chemical elements
● 1. Find relevant documents
○ Abstracts that mention Rare Earth elements in some form
● 2. Analyse these documents for trends/patterns
27
Research Project MAI 2 - Group n.4
Task 10: Finding Relevant Documents
● Regular Expressions
○ Can detect different ways of writing Rare Earths
● Full names
○ Yttrium / yttrium
● Chemical Formulae
○ Zr-Ce / YBa2Cu3O7+Ni / YSi1.7
● Some false positives
○ ZYMV-S (Zucchini Yellow Mosaic Virus)
○ especially for Yttrium
28
Research Project MAI 2 - Group n.4
Task description
Use TF-IDF to order the
190,692 publications
according to the similarity
of their abstract with the
Wikipedia article “Rare
earth element”
Task 10 - TF-IDF approach 1/3
29
Background knowledge on Rare earth elements
Research Project MAI 2 - Group n.4 30
QueryDoc
0001.txt
Doc
192K.txt
…
s = A x bT
Linear kernel
Text preprocessing Text preprocessing
Query vector
(n. query terms)
TF-IDF index
(ndocs x n.terms)
Task 10: TF-IDF approach 2/3
Research Project MAI 2 - Group n.4 31
Task 10: TF-IDF approach 3/3
Example: first result, doc id 20350
The nano-grained Ni/ZrO2 catalysts containing rare earth element oxides were prepared by
oxidation-reduction pretreatment of amorphous Ni-(40-x) at% Zr-x at% rare earth element (Y,
Ce and Sm; x=1 - 10) alloy precursors. The conversion of carbon dioxide on the catalysts
containing 1 at% rare earth elements was almost the same as that on the rare earth element-
free catalyst, but the addition of 5 at% or more rare earth elements increased remarkably the
conversion at 473 K. In contrast to the formation of monoclinic and tetragonal ZrO2 during
pretreatment of amorphous Ni-Zr alloys containing 1 at% rare earth elements, tetragonal
ZrO2, which is generally stable only at high temperatures, was predominantly formed during
the pretreatment of the catalysts containing 5 at% or more rare earth elements. The surface
area of the catalysts increased with the content of rare earth element. Thus, the increase in
the surface area and stabilization of tetragonal ZrO2 seem to be responsible for the
improvement of catalytic activity of the Ni-Zr alloy-derived catalysts by the addition of rare
earth elements.
Research Project MAI 2 - Group n.4
Task 10: Removing False Positives
● Compute similarity to wikipedia page on Rare Earth elements
○ TF-IDF vectors
● Reject documents with similarity score below threshold
● Conservative threshold (0.005)
○ filters some false positives
○ excludes few (if any) true positives
○ manually determined
32
Research Project MAI 2 - Group n.4
Task 10: Analysis
33
Research Project MAI 2 - Group n.4
Task 10: Analysis
● Top 3 countries
○ Saudi Arabia (15.74%), Slovenia (12.59%), Romania (9.13%)
● Top subject categories per Rare Earth element
● Rare Earth element trends over the years
● See report for detailed results
34
Research Project MAI 2 - Group n.4
Task 10: Rare Earth substitution
● Search articles that address substitution
● Lucene to search within RE abstracts (11.430)
● Search for “substitut*”, “replace*” or “alternative” (955)
● Filtered by sentences containing chemical formula (841)
● found no article that directly address substitution
● but e.g. refer to alternative methods or substitution as
chemical reaction
35
Research Project MAI 2 - Group n.4
Wrapping up
36
• TASK9:
• Represent authors by collection of their articles
• k-Nearest Neighbor classification
• high accuracy
• TASK10:
• two approaches to find Rare Earth articles:
• similarity to wikipedia article with tf-idf
• regular expressions
• substitution: search for abstracts that address RE substitution
directly
Research Project MAI 2 - Group n.4
Conclusions
● variety of techniques for more insight
● ontologies and visualization
● most popular topics for years or countries
● predicting number of citations
Assistance for decision making
e.g. in which research areas should be invested
37
Research Project MAI 2 - Group n.4
Improvements
• Improve RE substitution results by Machine Learning
techniques
• Need annotated data
• More advanced Machine Learning techniques for ontology
learning, e.g. clustering
38
Research Project MAI 2 - Group n.4
Thank you for your attention.
Questions?
39

More Related Content

Viewers also liked

Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Daniele Di Mitri
 
(IT) Slides della presentazione della tesi di Laurea
(IT) Slides della presentazione della tesi di Laurea(IT) Slides della presentazione della tesi di Laurea
(IT) Slides della presentazione della tesi di LaureaDaniele Di Mitri
 
Visual Learning Pulse - Final Thesis presentation
Visual Learning Pulse - Final Thesis presentationVisual Learning Pulse - Final Thesis presentation
Visual Learning Pulse - Final Thesis presentationDaniele Di Mitri
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri
 
Learning Pulse - paper presentation at LAK17
Learning Pulse - paper presentation at LAK17Learning Pulse - paper presentation at LAK17
Learning Pulse - paper presentation at LAK17Daniele Di Mitri
 
Fmcg training modules-bfg
Fmcg training modules-bfgFmcg training modules-bfg
Fmcg training modules-bfgRomy Cagampan
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017LinkedIn
 

Viewers also liked (8)

Visual Learning Pulse
Visual Learning PulseVisual Learning Pulse
Visual Learning Pulse
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
 
(IT) Slides della presentazione della tesi di Laurea
(IT) Slides della presentazione della tesi di Laurea(IT) Slides della presentazione della tesi di Laurea
(IT) Slides della presentazione della tesi di Laurea
 
Visual Learning Pulse - Final Thesis presentation
Visual Learning Pulse - Final Thesis presentationVisual Learning Pulse - Final Thesis presentation
Visual Learning Pulse - Final Thesis presentation
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
Learning Pulse - paper presentation at LAK17
Learning Pulse - paper presentation at LAK17Learning Pulse - paper presentation at LAK17
Learning Pulse - paper presentation at LAK17
 
Fmcg training modules-bfg
Fmcg training modules-bfgFmcg training modules-bfg
Fmcg training modules-bfg
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
 

Similar to Research project MAI2 - Final Presentation Group 4

Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectAnubhav Jain
 
Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...National Institute of Informatics
 
Curriculum data enrichment with ontologies
Curriculum data enrichment with ontologiesCurriculum data enrichment with ontologies
Curriculum data enrichment with ontologiesILOT Project
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...Anubhav Jain
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningMikel Emaldi Manrique
 
How to give a written presentation. Conferences: Oral and poster communicatio...
How to give a written presentation.Conferences: Oral and poster communicatio...How to give a written presentation.Conferences: Oral and poster communicatio...
How to give a written presentation. Conferences: Oral and poster communicatio...EDUTIC_aulas_digitales
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
 
KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...
KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...
KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...Nishita Jaykumar
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
Top 25 Dissertation Ideas In Physics.pdf
Top 25 Dissertation Ideas In Physics.pdfTop 25 Dissertation Ideas In Physics.pdf
Top 25 Dissertation Ideas In Physics.pdfAhmadTariq64
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...National Institute of Informatics
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...Anubhav Jain
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
 
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...Giannis Tsakonas
 
2017 ECS San Francisco Section Cubicciotti Award Ceremony Talk
2017 ECS San Francisco Section Cubicciotti Award Ceremony Talk2017 ECS San Francisco Section Cubicciotti Award Ceremony Talk
2017 ECS San Francisco Section Cubicciotti Award Ceremony TalkTianyu Liu
 

Similar to Research project MAI2 - Final Presentation Group 4 (20)

Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 
Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...
 
Curriculum data enrichment with ontologies
Curriculum data enrichment with ontologiesCurriculum data enrichment with ontologies
Curriculum data enrichment with ontologies
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Masters Thesis Defense Presentation
Masters Thesis Defense PresentationMasters Thesis Defense Presentation
Masters Thesis Defense Presentation
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
 
How to give a written presentation. Conferences: Oral and poster communicatio...
How to give a written presentation.Conferences: Oral and poster communicatio...How to give a written presentation.Conferences: Oral and poster communicatio...
How to give a written presentation. Conferences: Oral and poster communicatio...
 
Digitally enabling the RSC archive
Digitally enabling the RSC archiveDigitally enabling the RSC archive
Digitally enabling the RSC archive
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...
KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...
KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
Top 25 Dissertation Ideas In Physics.pdf
Top 25 Dissertation Ideas In Physics.pdfTop 25 Dissertation Ideas In Physics.pdf
Top 25 Dissertation Ideas In Physics.pdf
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
 
2017 ECS San Francisco Section Cubicciotti Award Ceremony Talk
2017 ECS San Francisco Section Cubicciotti Award Ceremony Talk2017 ECS San Francisco Section Cubicciotti Award Ceremony Talk
2017 ECS San Francisco Section Cubicciotti Award Ceremony Talk
 
BDavisprelimPPT
BDavisprelimPPTBDavisprelimPPT
BDavisprelimPPT
 

More from Daniele Di Mitri

SenseTheClassroom Live at EC-TEL 2022
SenseTheClassroom Live at EC-TEL 2022SenseTheClassroom Live at EC-TEL 2022
SenseTheClassroom Live at EC-TEL 2022Daniele Di Mitri
 
Guest Lecture: Restoring Context in Distance Learning with Artificial Intelli...
Guest Lecture: Restoring Context in Distance Learning with Artificial Intelli...Guest Lecture: Restoring Context in Distance Learning with Artificial Intelli...
Guest Lecture: Restoring Context in Distance Learning with Artificial Intelli...Daniele Di Mitri
 
SITE Interactive kenyote 2021
SITE Interactive kenyote 2021SITE Interactive kenyote 2021
SITE Interactive kenyote 2021Daniele Di Mitri
 
MOBIUS: Smart Mobility Tracking with Smartphone Sensors
MOBIUS: Smart Mobility Tracking with Smartphone SensorsMOBIUS: Smart Mobility Tracking with Smartphone Sensors
MOBIUS: Smart Mobility Tracking with Smartphone SensorsDaniele Di Mitri
 
The Multimodal Tutor - Presentation PhD defence
The Multimodal Tutor - Presentation PhD defenceThe Multimodal Tutor - Presentation PhD defence
The Multimodal Tutor - Presentation PhD defenceDaniele Di Mitri
 
Real-time Multimodal Feedback with the CPR Tutor
Real-time Multimodal Feedback with the CPR TutorReal-time Multimodal Feedback with the CPR Tutor
Real-time Multimodal Feedback with the CPR TutorDaniele Di Mitri
 
Multimodal Tutor for CPR presented at AIME'19
Multimodal Tutor for CPR presented at AIME'19Multimodal Tutor for CPR presented at AIME'19
Multimodal Tutor for CPR presented at AIME'19Daniele Di Mitri
 
The Multimodal Learning Analytics Pipeline
The Multimodal Learning Analytics PipelineThe Multimodal Learning Analytics Pipeline
The Multimodal Learning Analytics PipelineDaniele Di Mitri
 
Workshop: Multimodal Tutor
Workshop: Multimodal TutorWorkshop: Multimodal Tutor
Workshop: Multimodal TutorDaniele Di Mitri
 
Read Between The Lines: an Annotation Tool for Multimodal Data
Read Between The Lines: an Annotation Tool for Multimodal DataRead Between The Lines: an Annotation Tool for Multimodal Data
Read Between The Lines: an Annotation Tool for Multimodal DataDaniele Di Mitri
 
The Multimodal Tutor - short pitch presentation at JTELSS 2018 in Durrës, Alb...
The Multimodal Tutor - short pitch presentation at JTELSS 2018 in Durrës, Alb...The Multimodal Tutor - short pitch presentation at JTELSS 2018 in Durrës, Alb...
The Multimodal Tutor - short pitch presentation at JTELSS 2018 in Durrës, Alb...Daniele Di Mitri
 
Sensors for Learning workshop
Sensors for Learning workshopSensors for Learning workshop
Sensors for Learning workshopDaniele Di Mitri
 
Multimodal Machines #JTELSS17 workshop
Multimodal Machines #JTELSS17 workshopMultimodal Machines #JTELSS17 workshop
Multimodal Machines #JTELSS17 workshopDaniele Di Mitri
 
Multimodal Tutor - Adaptive feedback from multimodal experience capturing
Multimodal Tutor - Adaptive feedback from multimodal experience capturingMultimodal Tutor - Adaptive feedback from multimodal experience capturing
Multimodal Tutor - Adaptive feedback from multimodal experience capturingDaniele Di Mitri
 
Obessu’s inputs on «opening up education»
Obessu’s inputs on «opening up education»Obessu’s inputs on «opening up education»
Obessu’s inputs on «opening up education»Daniele Di Mitri
 
European politicaldebates presentation
European politicaldebates presentationEuropean politicaldebates presentation
European politicaldebates presentationDaniele Di Mitri
 

More from Daniele Di Mitri (16)

SenseTheClassroom Live at EC-TEL 2022
SenseTheClassroom Live at EC-TEL 2022SenseTheClassroom Live at EC-TEL 2022
SenseTheClassroom Live at EC-TEL 2022
 
Guest Lecture: Restoring Context in Distance Learning with Artificial Intelli...
Guest Lecture: Restoring Context in Distance Learning with Artificial Intelli...Guest Lecture: Restoring Context in Distance Learning with Artificial Intelli...
Guest Lecture: Restoring Context in Distance Learning with Artificial Intelli...
 
SITE Interactive kenyote 2021
SITE Interactive kenyote 2021SITE Interactive kenyote 2021
SITE Interactive kenyote 2021
 
MOBIUS: Smart Mobility Tracking with Smartphone Sensors
MOBIUS: Smart Mobility Tracking with Smartphone SensorsMOBIUS: Smart Mobility Tracking with Smartphone Sensors
MOBIUS: Smart Mobility Tracking with Smartphone Sensors
 
The Multimodal Tutor - Presentation PhD defence
The Multimodal Tutor - Presentation PhD defenceThe Multimodal Tutor - Presentation PhD defence
The Multimodal Tutor - Presentation PhD defence
 
Real-time Multimodal Feedback with the CPR Tutor
Real-time Multimodal Feedback with the CPR TutorReal-time Multimodal Feedback with the CPR Tutor
Real-time Multimodal Feedback with the CPR Tutor
 
Multimodal Tutor for CPR presented at AIME'19
Multimodal Tutor for CPR presented at AIME'19Multimodal Tutor for CPR presented at AIME'19
Multimodal Tutor for CPR presented at AIME'19
 
The Multimodal Learning Analytics Pipeline
The Multimodal Learning Analytics PipelineThe Multimodal Learning Analytics Pipeline
The Multimodal Learning Analytics Pipeline
 
Workshop: Multimodal Tutor
Workshop: Multimodal TutorWorkshop: Multimodal Tutor
Workshop: Multimodal Tutor
 
Read Between The Lines: an Annotation Tool for Multimodal Data
Read Between The Lines: an Annotation Tool for Multimodal DataRead Between The Lines: an Annotation Tool for Multimodal Data
Read Between The Lines: an Annotation Tool for Multimodal Data
 
The Multimodal Tutor - short pitch presentation at JTELSS 2018 in Durrës, Alb...
The Multimodal Tutor - short pitch presentation at JTELSS 2018 in Durrës, Alb...The Multimodal Tutor - short pitch presentation at JTELSS 2018 in Durrës, Alb...
The Multimodal Tutor - short pitch presentation at JTELSS 2018 in Durrës, Alb...
 
Sensors for Learning workshop
Sensors for Learning workshopSensors for Learning workshop
Sensors for Learning workshop
 
Multimodal Machines #JTELSS17 workshop
Multimodal Machines #JTELSS17 workshopMultimodal Machines #JTELSS17 workshop
Multimodal Machines #JTELSS17 workshop
 
Multimodal Tutor - Adaptive feedback from multimodal experience capturing
Multimodal Tutor - Adaptive feedback from multimodal experience capturingMultimodal Tutor - Adaptive feedback from multimodal experience capturing
Multimodal Tutor - Adaptive feedback from multimodal experience capturing
 
Obessu’s inputs on «opening up education»
Obessu’s inputs on «opening up education»Obessu’s inputs on «opening up education»
Obessu’s inputs on «opening up education»
 
European politicaldebates presentation
European politicaldebates presentationEuropean politicaldebates presentation
European politicaldebates presentation
 

Recently uploaded

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Recently uploaded (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 

Research project MAI2 - Final Presentation Group 4

  • 1. Research project MAI 2 Final presentation - Group n. 4 1 S. Deckers - J. Hermans - A. Ludermann - D. Di Mitri - J. Rutten - D. Soemers
  • 2. Research Project MAI 2 - Group n.4 ● Our data ● Visualisation ● Analysing keywords ● Ontology ● Cluster articles ● Predicting citations ● Analyzing raw material ● Conclusion ● Improvements Outline 2
  • 3. Research Project MAI 2 - Group n.4 Our data 3
  • 4. Research Project MAI 2 - Group n.4 Our data 4
  • 5. Research Project MAI 2 - Group n.4 Visualisations Task 5: “Visualising the articles in a relevant context of time and geographical location in 2D or 3D” 5
  • 6. Research Project MAI 2 - Group n.4 Task 5 - Results 6
  • 7. Research Project MAI 2 - Group n.4 Analysing Keywords Task 1: “Determining combinations of keywords that are specific for each year, country, journal, and subject category” Task 6: “Extracting (combinations of) keywords from abstracts and titles” 7
  • 8. Research Project MAI 2 - Group n.4 Task 1 • TF-IDF as feature extraction method. • Treat objects of interest and their keywords as documents. • Extract relevant keywords by making use of a threshold. • Fast Fetching model Generic document processor Combination model 8
  • 9. Research Project MAI 2 - Group n.4 Task 6 1. Preprocessing of abstracts (tokenization, stemming, removal of stopwords to reduce dimensionality). 2. Construct vector space and word mapping for every article abstract. LDA (treat sentences as documents). TF-IDF seemed too naïve. 3. Apply LDA (k = 1) on vector space to fetch distribution over words. 4. Use wordmapping (index -> word), to extract relevant words. 9
  • 10. Research Project MAI 2 - Group n.4 Ontology Task 2: “Specifying an application independent ontology of publications.” Task 7: “Defining ontology of the domain of nanotechnology which should be linked to the ontology of publications made in the first block.” Task 8: “Automatically generating ontology for the publication data. Compare this ontology with the one you defined yourselves. Fill the ontology with data from the articles.” 10
  • 11. Research Project MAI 2 - Group n.4 Task 2: Result 11
  • 12. Research Project MAI 2 - Group n.4 Task 7: Result 12
  • 13. Research Project MAI 2 - Group n.4 • Ontology Learning • Automatic or semi-automatic creation of ontologies • Requires text (or other data) • Often requires human supervision / corrections • Approach • Accept single words from user input • Allow choice of different senses of word • Automatically generate related words Task 8: Approach 13
  • 14. Research Project MAI 2 - Group n.4 Cluster Articles Task 4: “Learning article dendrograms and interpreting the dendrogram clusters” • Approach • Analysing splitting at the root 14
  • 15. Research Project MAI 2 - Group n.4 • Sample 8,000 articles from database • Top-down hierarchical clustering • K-Means on each level with K = 2 • Stop splitting when cluster small enough or dense enough • Repeat N times and compare results Task 4: Approach 15
  • 16. Research Project MAI 2 - Group n.4 16 Task 4: Dendrogram
  • 17. Research Project MAI 2 - Group n.4 • Year Features • 1998, 1999, 2000, 2001, 2002 (all 4x) • Country Features • USA (4x), Japan (2x), Germany (1x), Peoples R. China (1x) • Subject Features • Physics, Condensed Matter (4x) • Physics, Applied (3x) • Chemistry, Physical (3x) • Materials Science, Multidisciplinary (2x) Task 4: Analysing split at root 17
  • 18. Research Project MAI 2 - Group n.4 Predicting citations Task 3: “Learning models that predict the citations of articles.” Task 9: “Predicting the most cited authors.” k-Nearest Neighbor classification 18
  • 19. Research Project MAI 2 - Group n.4 • k-Nearest Neighbor, with k = 1 (results are sufficient) • Considered attributes: • Cited patents • Publication year • Countries • Subject category • Author affiliation origin • Instance representation using a boolean array • Cosine similarity Initial approach (1) 19
  • 20. Research Project MAI 2 - Group n.4 Classification using four classes: • 0: no citations • 1-20: low number of citations • 21-100: medium number of citations • 101 and more: high number of citations Initial approach (2) 20
  • 21. Research Project MAI 2 - Group n.4 Problem! 21 • 189,508 data instances (valid data entries) • ~ 14,000 dimensional space • Bool eq. to byte (smallest addressable memory elem.) ~14 kB for every instance! ~2.7 GB to contain complete dataset!
  • 22. Research Project MAI 2 - Group n.4 Solution 22 • Use the boolean nature of the instance representation! • Address and modify bit’s using bitmasks. ~14 kB reduced to ~1.7 kB ~2.7 GB reduced to ~332 MB Memory consumption reduced by a factor of 8.
  • 23. Research Project MAI 2 - Group n.4 Additional optimizations 23 Bit representation allows us to make more efficient use of the CPU’s ALU. Optimization of Cosine Similarity. Increase in classification performance using linear search. Original BitSet implementation
  • 24. Research Project MAI 2 - Group n.4 • 10-fold-cross validation • Avg. accuracy Class 0 : 0.7908 • Avg. accuracy Class 1 : 0.9943 • Avg. accuracy Class 2 : 0.9823 • Avg. accuracy Class 3 : 0.8175 • Total avg. accuracy: 0.8963 Task 3 - Results 24
  • 25. Research Project MAI 2 - Group n.4 Represent author by his / her articles (instances) since author cannot be uniquely identified. Task 9 - Results 25 Search for Class 3 instances. Avg. accuracy for Class 3 classification: 0.7377
  • 26. Research Project MAI 2 - Group n.4 Analysing raw materials Task 10: “Determining new substitutes of expensive raw materials” 26
  • 27. Research Project MAI 2 - Group n.4 Task 10: Determining new substitutes of rare raw materials ● Rare earth elements ○ group of 17 chemical elements ● 1. Find relevant documents ○ Abstracts that mention Rare Earth elements in some form ● 2. Analyse these documents for trends/patterns 27
  • 28. Research Project MAI 2 - Group n.4 Task 10: Finding Relevant Documents ● Regular Expressions ○ Can detect different ways of writing Rare Earths ● Full names ○ Yttrium / yttrium ● Chemical Formulae ○ Zr-Ce / YBa2Cu3O7+Ni / YSi1.7 ● Some false positives ○ ZYMV-S (Zucchini Yellow Mosaic Virus) ○ especially for Yttrium 28
  • 29. Research Project MAI 2 - Group n.4 Task description Use TF-IDF to order the 190,692 publications according to the similarity of their abstract with the Wikipedia article “Rare earth element” Task 10 - TF-IDF approach 1/3 29 Background knowledge on Rare earth elements
  • 30. Research Project MAI 2 - Group n.4 30 QueryDoc 0001.txt Doc 192K.txt … s = A x bT Linear kernel Text preprocessing Text preprocessing Query vector (n. query terms) TF-IDF index (ndocs x n.terms) Task 10: TF-IDF approach 2/3
  • 31. Research Project MAI 2 - Group n.4 31 Task 10: TF-IDF approach 3/3 Example: first result, doc id 20350 The nano-grained Ni/ZrO2 catalysts containing rare earth element oxides were prepared by oxidation-reduction pretreatment of amorphous Ni-(40-x) at% Zr-x at% rare earth element (Y, Ce and Sm; x=1 - 10) alloy precursors. The conversion of carbon dioxide on the catalysts containing 1 at% rare earth elements was almost the same as that on the rare earth element- free catalyst, but the addition of 5 at% or more rare earth elements increased remarkably the conversion at 473 K. In contrast to the formation of monoclinic and tetragonal ZrO2 during pretreatment of amorphous Ni-Zr alloys containing 1 at% rare earth elements, tetragonal ZrO2, which is generally stable only at high temperatures, was predominantly formed during the pretreatment of the catalysts containing 5 at% or more rare earth elements. The surface area of the catalysts increased with the content of rare earth element. Thus, the increase in the surface area and stabilization of tetragonal ZrO2 seem to be responsible for the improvement of catalytic activity of the Ni-Zr alloy-derived catalysts by the addition of rare earth elements.
  • 32. Research Project MAI 2 - Group n.4 Task 10: Removing False Positives ● Compute similarity to wikipedia page on Rare Earth elements ○ TF-IDF vectors ● Reject documents with similarity score below threshold ● Conservative threshold (0.005) ○ filters some false positives ○ excludes few (if any) true positives ○ manually determined 32
  • 33. Research Project MAI 2 - Group n.4 Task 10: Analysis 33
  • 34. Research Project MAI 2 - Group n.4 Task 10: Analysis ● Top 3 countries ○ Saudi Arabia (15.74%), Slovenia (12.59%), Romania (9.13%) ● Top subject categories per Rare Earth element ● Rare Earth element trends over the years ● See report for detailed results 34
  • 35. Research Project MAI 2 - Group n.4 Task 10: Rare Earth substitution ● Search articles that address substitution ● Lucene to search within RE abstracts (11.430) ● Search for “substitut*”, “replace*” or “alternative” (955) ● Filtered by sentences containing chemical formula (841) ● found no article that directly address substitution ● but e.g. refer to alternative methods or substitution as chemical reaction 35
  • 36. Research Project MAI 2 - Group n.4 Wrapping up 36 • TASK9: • Represent authors by collection of their articles • k-Nearest Neighbor classification • high accuracy • TASK10: • two approaches to find Rare Earth articles: • similarity to wikipedia article with tf-idf • regular expressions • substitution: search for abstracts that address RE substitution directly
  • 37. Research Project MAI 2 - Group n.4 Conclusions ● variety of techniques for more insight ● ontologies and visualization ● most popular topics for years or countries ● predicting number of citations Assistance for decision making e.g. in which research areas should be invested 37
  • 38. Research Project MAI 2 - Group n.4 Improvements • Improve RE substitution results by Machine Learning techniques • Need annotated data • More advanced Machine Learning techniques for ontology learning, e.g. clustering 38
  • 39. Research Project MAI 2 - Group n.4 Thank you for your attention. Questions? 39