Difficult to answer these questions:
• What are the different aspects of this topic?
• Are there related aspects missing in my search terms?
• Who are the most prominent authors about this topic?
• Which journals publish most about this topic?
• How have others — e.g. librarians — described and classified
How do we do this?
• OFFLINE: generates a semantic representation
for each entity
• ONLINE: finds the most related entities and
using multidimensional scaling to display
Build semantic representation
• Basic assumptions
– Entities can be represented by its context
– Entities which share more context are more likely
to be related
• Context is the textual environment where an
• The effects of state prekindergarten programs on young
children’s school readiness in five states
• [author:jung kwanghee]
• [subject:readiness for school]
● ArticleFirst, 65 million articles
● Selected 4 million entities (topical terms,
authors, ISSNs, Dewey decimal codes)
● Represented by 1 million topical terms
But a matrix of 4M x 1M is too big to process
Dimension reduction based on Random Projection
C: a co-occurrence matrix
R: a random matrix of +/-1
C’: approximation of C
after random projection
-- Semantic matrix
• An extremely fast way of navigating large scale
• Generalisable to different datasets
– Full WorldCat
– Small but highly curated astrophysics dataset
• Supports explorative information retrieval and
• Koopman, Rob, and Shenghui Wang. 2014. “Where Should I Publish? Detecting
Journal Similarity Based on What Has Been Published There.” In Proceedings of
Digital Libraries 2014, 483–484. London, United Kingdom. Association for
Computing Machinery. Paper, Poster
• Koopman, Rob, Shenghui Wang, Andrea Scharnhorst, and Gwenn Englebienne.
2015. “Ariadne’s Thread — Interactive Navigation in a World of Networked
Information”. In CHI '15 Extended Abstracts on Human Factors in Computing
Systems. ACM, Seoul, South Korea. Paper, Poster
• Koopman, Rob, Shenghui Wang and Andrea Scharnhorst. 2015. “Contextualization
of topics - browsing through terms, authors, journals and cluster allocations”. In
Proceedings of 15th International Conference on Scientometrics & Informetrics.
Istanbul, Turkey. Paper