4. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 4
Ontology Engineering Group
Ontologies LOT: Industrial Methodology
5. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 5
Ontology Engineering Group
Ontologies
Knowledge
Graphs
âȘ Sync. or Async. integration of heterogeneous data sources
âȘ Data quality, cleaning and linking functions
âȘ Linked Data Service publishing data or maven dependency
6. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 6
Ontologies
Knowledge
Graphs
âȘ Linguistic Linked Open Data
âȘ Word Sense Disambiguation
âȘ Named Entity Recognition
âȘ Question - Answering
NLP
Ontology Engineering Group
Information
Extraction
Knowledge-driven
Exploration
Entity
Linking
âȘ Probabilistic Topic Models
âȘ Taxonomies from corpora
âȘ Large-scale Searching
âȘ Classification of Out-of-Knowledge-based
Entities
KeyQ
7. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 7
Ontologies
Knowledge
Graphs
âȘ Creating KGs of Research Software Metadata
âȘ Tracking FAIR principles in Research Software
NLP
Ontology Engineering Group
Open
Science
8. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 8
2016 2021
2020
Probabilistic
Topic
Models
Clinical
Knowledge
Graphs
Hybrid
QA
2022
Fake
News
Detection
Industry Academy Academy Academy
Academy
Personal Background
http://librairy.eu
9. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Call for Papers
9
âȘ https://kgsum.github.io
âȘ Topics
âȘ Methods to summarize KGs
âȘ KGs features related to summaries
âȘ Scope and Impact of KG summaries
âȘ Call for Papers:
âȘ Paper Submission: Jul 7 (23:59 AoE), 2023
âȘ Notification to Authors: Jul 24, 2023
âȘ Workshop Dates: Nov 6-7, 2023 at Athens, Greece.
12. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Cross-lingual Document Similarity
12
13. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Cross-lingual Document Similarity
13
âȘThree challenges to perform large-scale retrieval of documents in multi-lingual corpora:
âȘ C1: Content representation
âȘ C2: High-dimensional correlation matrix
âȘ C3: Multi-lingual Comparison
?
EN
Patents
PhD
Thesis
ES FR
14. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Cross-lingual Document Similarity
14
âȘThree challenges to perform large-scale retrieval of documents in multi-lingual corpora:
âȘ C1: Content representation
âȘ C2: High-dimensional correlation matrix
âȘ C3: Multi-lingual Comparison
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet
Allocation. Journal of Machine Learning Research, 3(4â5), 993â1022.
âȘ Probabilistic Topic Models [Blei et al,
2003]
âȘ Each topic is a distribution over words
âȘ Each word is drawn from one of those
topics
âȘ Each document is a mixture of corpus-
wide topics
âȘ Fixed Vector of topic distributions
15. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Cross-lingual Document Similarity
15
âȘThree challenges to perform large-scale retrieval of documents in multi-lingual corpora:
âȘ C1: Content representation
âȘ C2: High-dimensional correlation matrix
âȘ C3: Multi-lingual Comparison
âȘ similar documents do not necessarily share the most relevant topic for
each of them.
a) simJS = 0.74 b) simJS = 0.71
Distance Metrics
16. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Cross-lingual Document Similarity
16
âȘThree challenges to perform large-scale retrieval of documents in multi-lingual corpora:
âȘ C1: Content representation
âȘ C2: High-dimensional correlation matrix
âȘ C3: Multi-lingual Comparison
âȘ Hashing Topic Distributions
[Badenes-Olmedo et al, 2019]
âȘ hierarchical set of topics
based on their relevance
Badenes-Omedo, C., Redondo-GarcĂa, J. L., & Corcho, O. (2019). Large-Scale Semantic Exploration of
Scientific Literature using Topic-based Hashing Algorithms. Semantic Web Journal.
17. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Cross-lingual Document Similarity
17
âȘThree challenges to perform large-scale retrieval of documents in multi-lingual corpora:
âȘ C1: Content representation
âȘ C2: High-dimensional correlation matrix
âȘ C3: Multi-lingual Comparison
âȘ Computation can be an
approximate nearest
neighbour (ANN) search
problem [Mao et al, 2017]
based on topic clusters.
Badenes-Omedo, C., Redondo-GarcĂa, J. L., & Corcho, O. (2019). Large-Scale Semantic Exploration of
Scientific Literature using Topic-based Hashing Algorithms. Semantic Web Journal.
19. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Cross-lingual Document Similarity
19
âȘ Multi-Lingual Dictionaries [Hao and Paul, 2018]
âȘ more widely available than parallel corpora
(e.g PANLEX or Wiktionary)
âȘ models are built from words in a target language
âȘ dictionaries as supervised method to align topics
âȘ topics conditioned by pre-established language
relations
A
B
C
D
E
âȘThree challenges to perform large-scale retrieval of documents in multi-lingual corpora:
âȘ C1: Content representation
âȘ C2: High-dimensional correlation matrix
âȘ C3: Multi-lingual Comparison
20. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Cross-lingual Document Similarity
20
âȘThree challenges to perform large-scale retrieval of documents in multi-lingual corpora:
âȘ C1: Content representation
âȘ C2: High-dimensional correlation matrix
âȘ C3: Multi-lingual Comparison
Wordnet:
âȘ It is a semantic network.
âȘ Synonymous words are grouped into synsets.
âȘ These synsets are then linked to other synsets via semantic
relations
âȘ e.g. hypernym or hyponym.
Bond, Francis, P. Vossen, John P. McCrae and Christiane D. Fellbaum. âCILI: the Collaborative Interlingual Index.â Global WordNet Conference (2016).
23. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Cross-lingual Document Similarity
23
âȘThree challenges to perform large-scale retrieval of documents in multi-lingual corpora:
âȘ C1: Content representation
âȘ C2: High-dimensional correlation matrix
âȘ C3: Multi-lingual Comparison
https://github.com/librairy/demo
24. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Cross-lingual Document Similarity
24
âȘThree challenges to perform large-scale retrieval of documents in multi-lingual corpora:
âȘ C1: Content representation
âȘ C2: High-dimensional correlation matrix
âȘ C3: Multi-lingual Comparison
âȘ Document Classification Task
âȘ Metrics: precision, recall and f-measure
âȘ Data: ~1k docs
(monolingual, bi-lingual or multilingual
documents )
âȘ Methodology: comparison of clusters based
on EUROVOC categories and annotations
created by the model:
âȘ supervised = labeledLDA
âȘ unsupervised = LDA
Precision
0
25
50
75
100
e
n
e
s
f
r
e
n
-
e
s
e
n
-
f
r
e
s
-
f
r
e
n
-
e
s
-
f
r
supervised unsupervised
Recall
0
25
50
75
100
e
n
e
s
f
r
e
n
-
e
s
e
n
-
f
r
e
s
-
f
r
e
n
-
e
s
-
f
r
supervised unsupervised
25. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Cross-lingual Document Similarity
25
âȘThree challenges to perform large-scale retrieval of documents in multi-lingual corpora:
âȘ C1: Content representation
âȘ C2: High-dimensional correlation matrix
âȘ C3: Multi-lingual Comparison
âȘ Document Retrieval Task
âȘ Metrics: precision@3, precision@5 and
precision@10
âȘ Data: ~1k docs
(monolingual, bi-lingual or multilingual
documents )
âȘ Methodology: comparison of clusters based
on EUROVOC categories and annotations
created by the model:
âȘ supervised = labeledLDA
âȘ unsupervised = LDA
Precision@3
0
25
50
75
100
e
n
e
s
f
r
e
n
-
e
s
e
n
-
f
r
e
s
-
f
r
e
n
-
e
s
-
f
r
supervised unsupervised
Precision@10
0
25
50
75
100
e
n
e
s
f
r
e
n
-
e
s
e
n
-
f
r
e
s
-
f
r
e
n
-
e
s
-
f
r
supervised unsupervised
27. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Multiple and Heterogenous QA (MuHeQA)
27
âȘ Objective: facilitate access to information
from KGs and unstructured data.
28. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Multiple and Heterogenous QA (MuHeQA)
28
Badenes-Omedo, and Corcho, O. (2023).MuHeQA: Zero-shot Question Answering over Multiple and
Heterogeneous Knowledge Bases. Semantic Web Journal.
29. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Multiple and Heterogenous QA (MuHeQA)
29
https://github.com/librairy/muheqa
30. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 30
âȘ SPARQL queries used to extract the properties of a KG resource
Wikidata: DBpedia:
Multiple and Heterogenous QA (MuHeQA)
31. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 31
âȘ Performance identifing keywords in a question:
Multiple and Heterogenous QA (MuHeQA)
âąOur method identifies the entities mentioned in a question along with the relevant terms discovered using PoS
annotations:
32. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 32
âȘ Performance when discovering Wikidata or DBpedia resources :
Multiple and Heterogenous QA (MuHeQA)
Wikidata: DBpedia:
âąWe discard the creation of vector spaces where each resource is represented by its labels [7], since one of our
assumptions is to avoid the creation of supervised models that perform specific classification tasks over the KG (i.e.
prior training)
âąOur proposal does not require training datasets since it performs textual searches based on the terms identified in the
query using an inverse index of the labels associated with the resources .
33. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 33
âȘ Performance based on Knowledge Graph-oriented QA:
Multiple and Heterogenous QA (MuHeQA)
âąThe results show that our approach offers a performance close to the best system, STaF-QA, and better than other
approaches specific to KGQA.
âąHowever, one of the weak points is the recall, which means that our approach has to improve in response
elaboration. The answer is perhaps too straight forward, and we should be concerned with constructing more
complex responses.
34. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 34
âȘ Performance based on Document-oriented QA:
Multiple and Heterogenous QA (MuHeQA)
âąThe answers created by our algorithm are not as elaborate as those in the evaluation dataset, which were created
manually, and this penalises the performance of our system.
âąFor example, given the question "How many children were infected by HIV-1 in 2008-2009, worldwide?", the answer
inferred by our system is "more than 400,000", while the correct answer is "more than 400,000 children were infected
worldwide, mostly through MTCT and 90% of them lived in sub-Saharan Africa"
36. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
36
âȘ Release of scientific documents on coronaviruses (useful in doc retrieval, IE, and knowledge
management task)
âȘ First goal: make the scientific literature around coronaviruses useful in some of the immediate
needs of hospital pharmacies (e.g. drug shortages, or interactions between chemical substances)
âȘ After: provide an up-to-date knowledge base on coronaviruses extracted from scientific
publications
Badenes-Olmedo, Carlos and Ăscar Corcho. âLessons learned to enable question answering on knowledge graphs extracted from
scienti
fi
c publications: A case study on the coronavirus literature.â Journal of Biomedical Informatics 142 (2023): 104382 - 104382.
37. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
37
âȘRQ1: How to systematize the processing of scienti
fi
c corpora to build knowledge
graphs?
âȘRQ2: How to identify and standardize drugs, diseases and genes/proteins
mentioned in scienti
fi
c texts?
âȘRQ3: How to formally describe evidence based on the association of drugs,
diseases, genes and proteins mentioned in the same paragraph of a scienti
fi
c
article?
âȘRQ4: How to relate drugs and diseases from the paragraphs where they are
mentioned?
âȘRQ5: How to provide access to a knowledge graph, together with the content of a
collection of scienti
fi
c publications, through natural language queries?
38. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
38
âȘThere is no common methodology for the construction of knowledge graphs from the
biomedical literature, but rather a series of steps or stages that coincide among existing works.
âȘWe propose a work
fl
ow that is also valid when information update cycles are short
oE.g. one-week update cycles
âȘThis work
fl
ow addresses the research question RQ1
39. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
39
âȘRQ1: How to systematize the processing of scienti
fi
c corpora to build knowledge
graphs?
âȘRQ2: How to identify and standardize drugs, diseases and genes/proteins
mentioned in scienti
fi
c texts?
âȘRQ3: How to formally describe evidence based on the association of drugs,
diseases, genes and proteins mentioned in the same paragraph of a scienti
fi
c
article?
âȘRQ4: How to relate drugs and diseases from the paragraphs where they are
mentioned?
âȘRQ5: How to provide access to a knowledge graph, together with the content of a
collection of scienti
fi
c publications, through natural language queries?
40. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
40
âąFine-tuned the BioBERT [Lee et al., 2020]
model to identify the following biomedical
classes:
-Diseases: BC5CDR-Diseases and NCBI-Diseases
-Chemicals: BC4CHEMD and BC5CDR-Drugs
-Genetics: JNLPBA and BC2GM
âąUnique representation of the concept from a set
of related terms composed using multiple
information sources
- 7.4 million annotations in JSON format
âąMultiple sources were taken into account to
create a database for each of the biomedical entities
âąWith the creation of these models we address the
research question RQ2
41. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
41
âȘRQ1: How to systematize the processing of scienti
fi
c corpora to build knowledge
graphs?
âȘRQ2: How to identify and standardize drugs, diseases and genes/proteins
mentioned in scienti
fi
c texts?
âȘRQ3: How to formally describe evidence based on the association of drugs,
diseases, genes and proteins mentioned in the same paragraph of a scienti
fi
c
article?
âȘRQ4: How to relate drugs and diseases from the paragraphs where they are
mentioned?
âȘRQ5: How to provide access to a knowledge graph, together with the content of a
collection of scienti
fi
c publications, through natural language queries?
42. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
42
âą First, identify the requirements to express biomedical
concepts and the associations between them.
-Uni
fi
ed Medical Language System (UMLS)
-Several efforts have been made to integrate
biomedical knowledge into a single shared
representation space (e.g. DISNET platform)
âą Then, create an ontology that describes:
- (i) biomedical concepts and associations between
them and
-(ii) the evidence supporting these associations.
âą Challenges:
-Biomedical taxonomies and vocabularies with reduced
semantics (e.g. SNOMED, ICD-10, UMLS...)
-how are the concepts related?
43. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
43
âą We created the Evidences for BiOmedical
Concepts Association (EBOCA) ontology to describe:
-biomedical concepts
-associations between them
-evidence supporting these associations.
âą It de
fi
nes the conceptual model on which the
Drugs4Covid knowledge graph is built
âą It is composed of two modules, one oriented toward
describing biomedical concepts and associations,
EBOCA SEM-DISNET, and the other focused on
representing evidence of these associations with
metadata and provenance information, EBOCA
Evidences https://w3id.org/eboca/portal
Perez, Andrea Alvarez, Ana Iglesias-Molina, Lucia Prieto Santamaria, Maria Poveda-Villalon, Carlos Badenes-Olmedo and Alejandro Rodriguez-
Gonzalez. âEBOCA: Evidences for BiOmedical Concepts Association Ontology.â International Conference Knowledge Engineering and
Knowledge Management (2022).
44. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
44
EBOCA SIM-DISNET module:
âąDesigned to represent associations of common
biomedical concepts, such as: diseases,
phenotypes, genes, genetic variants, biological
pathways, drugs, proteins, and targets.
âą Associations link pairs of concepts, for
example, the gene-disease or drug-disease
association
âą Adds semantics to the DISNET structure
-phenotypic layer
-biological layer
-pharmacological layer
45. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
45
EBOCA EVIDENCES module:
âąExtends the associations between
biomedical concepts of the SEM-DISNET
module with metadata and provenance
information
âą These evidences of associations may
come from known curated sources, or may
be drawn or inferred directly from the texts.
âą Describes in more detail the type of
evidence supported by the association, the
agents involved in its extraction and
publication
46. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
ÂĄ
46
âą The EBOCA ontology address the research question RQ2.
âą The evaluation of an ontology usually seeks to identify inconsistencies or formal errors that
invalidate its de
fi
nition. However, since this work is oriented to the creation of a knowledge
graph from a collection of scienti
fi
c articles, it is more interesting to focus on an evaluation
that measures the coverage of the ontology to a set of competency questions
»15 questions associated with the EBOCA SEM-DISNET module
» and 10 questions associated with the EBOCA Evidences module
47. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
47
âȘRQ1: How to systematize the processing of scienti
fi
c corpora to build knowledge
graphs?
âȘRQ2: How to identify and standardize drugs, diseases and genes/proteins
mentioned in scienti
fi
c texts?
âȘRQ3: How to formally describe evidence based on the association of drugs,
diseases, genes and proteins mentioned in the same paragraph of a scienti
fi
c
article?
âȘRQ4: How to relate drugs and diseases from the paragraphs where they are
mentioned?
âȘRQ5: How to provide access to a knowledge graph, together with the content of a
collection of scienti
fi
c publications, through natural language queries?
48. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
48
âȘObjective: annotate the entities, their relationships
and the evidence according to the EBOCA ontology
âȘMethodology:
-The mapping rules between the annotations
and the ontology resources were created with
Mapeathor and carried out by the Morph-kgc
library.
-The articles were then described by the
ontology in an RDF
fi
le.
-Finally, GraphDB was chosen to store the
RDF content and Helio to provide a SparQL
access interface
51. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Knowledge graph-driven Clinical Document Exploration (Drugs4Covid)
51
âȘRQ1: How to systematize the processing of scienti
fi
c corpora to build knowledge
graphs?
âȘRQ2: How to identify and standardize drugs, diseases and genes/proteins
mentioned in scienti
fi
c texts?
âȘRQ3: How to formally describe evidence based on the association of drugs,
diseases, genes and proteins mentioned in the same paragraph of a scienti
fi
c
article?
âȘRQ4: How to relate drugs and diseases from the paragraphs where they are
mentioned?
âȘRQ5: How to provide access to a knowledge graph, together with the content of a
collection of scienti
fi
c publications, through natural language queries?
52. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 52
âȘObjective: facilitate access to the
resources created in this work, especially
the knowledge graph.
âȘMethodology:
o There are multiple ways to exploit a KG:
prede
fi
ned SPARQL queries, guided
document searches, etc.
o Our proposal is a question-answer
(QA) system based on natural language,
MuHeQA
»combines ExtractiveQA and NLP with KGs
»Summarization, Evidence Extraction and Answer
Generation
âȘ
https://drugs4covid.oeg.fi.upm.es/services/bio-qa
54. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
..In this context the specialist said that what we
eat is the main cause of cancer risk, with a
*NUMBER* percent; in front of tobacco
consumption, responsible in a *NUMBER*
percent; and infections, with *NUMBER*
percentâŠâ
Multi-Dimensional Fake News Identification
54
âȘ Multi-Dimensional Fake News Identification:
Knowledge
Graph
5Ws evidences
Dictionary Social
Network
impact
55. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Inconsistency Detection
55
âȘ Inconsistency Detection:
social
media
political
parties
specific
issues
over
time
Guillen-Pancho, Ibai, Badenes-Olmedo, Carlos and Ăscar Corcho. âEnabling complex question support in hybrid knowledge bases .â(2023)
56. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Conversational-assisted Topic Labeling
56
âȘ Conversational-assisted Topic Labeling:
Words
Selection
Question
Composition
Question
Answering
Label
Retrieval
RamĂłn-Ferrer, Virginia, Badenes-Olmedo, Carlos and Ăscar Corcho. âAutomatic Topic Label Generation Using Conversational Models.â(2023)
57. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Multi-hop KGQA
57
Liu-Chen, Teng, Badenes-Olmedo, Carlos and Ăscar Corcho. âEnabling complex question support in hybrid knowledge bases .â(2023)
Min, Sewon, Victor Zhong, Luke
Zettlemoyer and Hannaneh
Hajishirzi. âMulti-hop Reading
Comprehension through Question
Decomposition and
Rescoring.â ArXiv abs/1906.02916
(2019)
âȘ Multi-hop KGQA:
59. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Additional Content
59
âȘ Ontology Engineering Framework
âȘ Knowledge Graph Tools
âȘ NLP Resources
âȘ Open Science
Ontologies
Knowledge
Graphs
NLP
Open
Science
60. OEG Ontology Engineering Framework
LOT industrial methodology
60
http://lot.linkeddata.es
+20
projects
More details at: Poveda-VillalĂłn, M., FernĂĄndez-Izquierdo, A., FernĂĄndez-LĂłpez, M., & GarcĂa-Castro,
R. (2022). LOT: An industrial oriented ontology engineering framework. Engineering Applications of
Artificial Intelligence, 111, 104755. https://doi.org/10.1016/j.engappai.2022.104755
61. OEG Ontology Engineering Framework
LOT adoption
61
âȘ + 20 projects (internal and external)
âȘ https://lot.linkeddata.es/#stories (selected examples)
64. OEG Ontology Engineering Framework
Technology landscape
64
âą https://chowlk.linkeddata.es
âą Notation and converter
âą Hosted by OEG
âą Developed by OEG
65. OEG Ontology Engineering Framework
Technology landscape
65
âą https://lov.linkeddata.es
âą Vocabulary registry and index
âą Hosted by OEG
âą Not developed by OEG
66. OEG Ontology Engineering Framework
Technology landscape
66
âą http://oops.linkeddata.es/
âą Check for pitfalls (41
defined 31 automated)
âą Online app
âą Web service
âą http://themis.linkeddata.es
âą Ontology unit tests based validation
âą Online app
âą Web service
67. OEG Ontology Engineering Framework
Technology landscape
67
âą https://github.com/dgarijo/Widoco/
âą Ontology documentation
âą Desktop app (maintained by Daniel
Garijo, ISI, California)
âą Web service (OEG)
68. OEG Ontology Engineering Framework
Technology landscape
68
âą https://github.com/oeg-upm/vocab.linkeddata.es
âą Ontology portal generation
âą jar distribution
73. OEG Ontology Engineering Framework
Technology landscape
73
âą https://helio.linkeddata.es/
âą RDF from heterogeneous data
sources
âą Java ? Web service? Api?
âą
âą https://helio.linkeddata.es/
âą Sync. or Async. integration of
heterogeneous data sources
âą Data quality, cleaning and linking
functions
âą Liked Data Service publishing data
or maven dependency
74. OEG Ontology Engineering Framework
Technology landscape
74
âą https://astrea.linkeddata.es
âą Generation of SHACL shapes from
ontologies
âą Validation of data using SHACL
shapes
âą Online service
75. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Additional Content
75
âȘ Ontology Engineering Framework
âȘ Knowledge Graph Tools
âȘ NLP Resources
âȘ Open Science
Ontologies
Knowledge
Graphs
NLP
Open
Science
77. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Drugs4Covid
77
https://drugs4covid.oeg.fi.upm.es/
https://github.com/drugs4covid
78. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Añotador
78
https://annotador.oeg.fi.upm.es/
https://github.com/mnavasloro/Annotador
79. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
TermitUp
79
https://termitup.oeg.fi.upm.es/
80. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
KeyQ
80
https://aiproc.linkeddata.es/
81. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es 81
Methods for Knowledge-Based Systems and Deep Learning Integration
Semantic-based Initialization for OOKB entities
1
KG
dumps
Endpoint
PHASE 1
Entity
embedding
Ontology
embedding
Entity
Entity
ontological
information
KG
ontology
PHASE 2
c
c
Embedding
composition
Embedding
composition
Ontological
information
embeddings
Ontology
embedding
Entity
embedding
PHASE 3
Initial embedding
Class
Hierarchy
Thing
Place
Country
Class
Hierarchy
Thing
Person
Amador-DomĂnguez, E., Serrano, E., Manrique, D., Hohenecker, P., and Lukasiewicz, T. (2021). An ontology-based deep learning approach for triple classification
with out-of-knowledge-base entities. Information Sciences, 564, 85â102.
82. âNLP and Knowledge Graphsâ- carlos.badenes@upm.es
Additional Content
82
âȘ Ontology Engineering Framework
âȘ Knowledge Graph Tools
âȘ NLP Resources
âȘ Open Science
Ontologies
Knowledge
Graphs
NLP
Open
Science
83. scalable cross-lingual document similarity 83
âȘ Readme Analysis
o Supervised
classification
o Regular expressions
o Header analysis
âȘ File exploration
o Notebooks
o Dockerfiles
o Documentation
âȘ GitHub API
Repository Extraction Results (Metadata)
https://github.com/KnowledgeCaptureAndDiscovery/somef/
1
Kelley, A., & Garijo, D. (2021). A framework for creating knowledge graphs of scientific software metadata. Quantitative Science Studies, 1-37.
Open Science: Creating KGs of Research Software Metadata
84. scalable cross-lingual document similarity 84 2
Open Science: Tracking FAIR principles in Research Software
- Continuous updates to
Research Software catalogs
of tools
- From software metadata
extraction automated
feedback on compliance with
best practices
- Linking research articles with
their corresponding software
tools
https://software.oeg.fi.upm.es/