Topic modeling techniques have been applied in many scenarios
in recent years, spanning textual content, as well as many
different data sources. The existing researches in this field
continuously try to improve the accuracy and coherence of
the results. Some recent works propose new methods that capture
the semantic relations between words into the topic modeling
process, by employing vector embeddings over knowledge
In our recent paper presented at the AAAI Spring Symposium 2019, held at Stanford University, we studied how knowledge graph embeddings affect topic modeling performance
on textual content. In particular, the objective of the
work is to determine which aspects of knowledge graph embedding
have a significant and positive impact on the accuracy
of the extracted topics.
We improve the state of the art by integrating some advanced graph embedding approaches (specifically designed for knowledge graphs) within the topic extraction process.
We also studied how the knowledge base could be expanded by using dataset-specific relations between the words.
We implemented the method and we validated it with
a set of experiments with 2 variations of the knowledge
base, 7 embedding methods, and 2 methods for incorporation
of the embeddings into the topic modeling framework, also
considering different parametrizations of topic number and embedding
Besides the specific technical results, the work has also aims at showing the potentials of integrating statistical methods with knowledge-centric methods. The full extent of the impact of these techniques shall be explored further in the future.
The details of the work are reported in the paper, which is available online here, and in the slides, also available online (on SlideShare).