SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadata
1.
Shenghui Wang
Rob Koopman
Exploring a world of
networked information
built from free-text
metadata
OCLC Research EMEA
ELAG2015
2.
What would you do if you are
interested in a topic?
3.
Difficult to answer these questions:
• What are the different aspects of this topic?
• Are there related aspects missing in my search terms?
• Who are the most prominent authors about this topic?
• Which journals publish most about this topic?
• How have others — e.g. librarians — described and classified
this topic?
5.
How do we do this?
• OFFLINE: generates a semantic representation
for each entity
• ONLINE: finds the most related entities and
using multidimensional scaling to display
6.
Build semantic representation
• Basic assumptions
– Entities can be represented by its context
– Entities which share more context are more likely
to be related
• Context is the textual environment where an
entity occurs
• The effects of state prekindergarten programs on young
children’s school readiness in five states
• [author:jung kwanghee]
• [subject:readiness for school]
7.
Dataset
● ArticleFirst, 65 million articles
● Selected 4 million entities (topical terms,
authors, ISSNs, Dewey decimal codes)
● Represented by 1 million topical terms
But a matrix of 4M x 1M is too big to process
8.
Dimension reduction based on Random Projection
C: a co-occurrence matrix
R: a random matrix of +/-1
C’: approximation of C
after random projection
-- Semantic matrix
9.
Online interface
• Find mutual nearest neighbors
• Use multidimensional scaling to display
12.
Possible applications
• Explorative interface
• Context based search:
– brain
• Journal finder
– Arctic ice journals
– http://brain.oxfordjournals.org/
• Author name disambiguation
– pre kindergarten
13.
Context matters!
• What does “young” mean in
- AritcleFirst
- WorldCat
- Astrophysics
- Art
14.
Ariadne
(demo) http://thoth.pica.nl/relate
• An extremely fast way of navigating large scale
hetereogeneous entities
• Generalisable to different datasets
– Full WorldCat
– Small but highly curated astrophysics dataset
• Supports explorative information retrieval and
entity disambiguation
15.
References
• Koopman, Rob, and Shenghui Wang. 2014. “Where Should I Publish? Detecting
Journal Similarity Based on What Has Been Published There.” In Proceedings of
Digital Libraries 2014, 483–484. London, United Kingdom. Association for
Computing Machinery. Paper, Poster
• Koopman, Rob, Shenghui Wang, Andrea Scharnhorst, and Gwenn Englebienne.
2015. “Ariadne’s Thread — Interactive Navigation in a World of Networked
Information”. In CHI '15 Extended Abstracts on Human Factors in Computing
Systems. ACM, Seoul, South Korea. Paper, Poster
• Koopman, Rob, Shenghui Wang and Andrea Scharnhorst. 2015. “Contextualization
of topics - browsing through terms, authors, journals and cluster allocations”. In
Proceedings of 15th International Conference on Scientometrics & Informetrics.
Istanbul, Turkey. Paper
16.
Explore. Share. Magnify.
Thank you
Shenghui Wang
Rob Koopman
OCLC Research EMEA
shenghui.wang@oclc.org
rob.koopman@oclc.org