Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Shenghui Wang
Rob Koopman
Exploring a world of
networked information
built from free-text
metadata
OCLC Research EMEA
ELAG...
What would you do if you are
interested in a topic?
Difficult to answer these questions:
• What are the different aspects of this topic?
• Are there related aspects missing i...
Demo
• http://thoth.pica.nl/relate?input=opac
How do we do this?
• OFFLINE: generates a semantic representation
for each entity
• ONLINE: finds the most related entitie...
Build semantic representation
• Basic assumptions
– Entities can be represented by its context
– Entities which share more...
Dataset
● ArticleFirst, 65 million articles
● Selected 4 million entities (topical terms,
authors, ISSNs, Dewey decimal co...
Dimension reduction based on Random Projection
C: a co-occurrence matrix
R: a random matrix of +/-1
C’: approximation of C...
Online interface
• Find mutual nearest neighbors
• Use multidimensional scaling to display
Nearest neighbors
Mutual nearest neighbors
Possible applications
• Explorative interface
• Context based search:
– brain
• Journal finder
– Arctic ice journals
– htt...
Context matters!
• What does “young” mean in
- AritcleFirst
- WorldCat
- Astrophysics
- Art
Ariadne
(demo) http://thoth.pica.nl/relate
• An extremely fast way of navigating large scale
hetereogeneous entities
• Gen...
References
• Koopman, Rob, and Shenghui Wang. 2014. “Where Should I Publish? Detecting
Journal Similarity Based on What Ha...
Explore. Share. Magnify.
Thank you
Shenghui Wang
Rob Koopman
OCLC Research EMEA
shenghui.wang@oclc.org
rob.koopman@oclc.org
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadata
Upcoming SlideShare
Loading in …5
×

Exploring a world of networked information built from free-text metadata

1,194 views

Published on

Ariadne: interactive context viewer for bibliographic data

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Exploring a world of networked information built from free-text metadata

  1. 1. Shenghui Wang Rob Koopman Exploring a world of networked information built from free-text metadata OCLC Research EMEA ELAG2015
  2. 2. What would you do if you are interested in a topic?
  3. 3. Difficult to answer these questions: • What are the different aspects of this topic? • Are there related aspects missing in my search terms? • Who are the most prominent authors about this topic? • Which journals publish most about this topic? • How have others — e.g. librarians — described and classified this topic?
  4. 4. Demo • http://thoth.pica.nl/relate?input=opac
  5. 5. How do we do this? • OFFLINE: generates a semantic representation for each entity • ONLINE: finds the most related entities and using multidimensional scaling to display
  6. 6. Build semantic representation • Basic assumptions – Entities can be represented by its context – Entities which share more context are more likely to be related • Context is the textual environment where an entity occurs • The effects of state prekindergarten programs on young children’s school readiness in five states • [author:jung kwanghee] • [subject:readiness for school]
  7. 7. Dataset ● ArticleFirst, 65 million articles ● Selected 4 million entities (topical terms, authors, ISSNs, Dewey decimal codes) ● Represented by 1 million topical terms But a matrix of 4M x 1M is too big to process
  8. 8. Dimension reduction based on Random Projection C: a co-occurrence matrix R: a random matrix of +/-1 C’: approximation of C after random projection -- Semantic matrix
  9. 9. Online interface • Find mutual nearest neighbors • Use multidimensional scaling to display
  10. 10. Nearest neighbors
  11. 11. Mutual nearest neighbors
  12. 12. Possible applications • Explorative interface • Context based search: – brain • Journal finder – Arctic ice journals – http://brain.oxfordjournals.org/ • Author name disambiguation – pre kindergarten
  13. 13. Context matters! • What does “young” mean in - AritcleFirst - WorldCat - Astrophysics - Art
  14. 14. Ariadne (demo) http://thoth.pica.nl/relate • An extremely fast way of navigating large scale hetereogeneous entities • Generalisable to different datasets – Full WorldCat – Small but highly curated astrophysics dataset • Supports explorative information retrieval and entity disambiguation
  15. 15. References • Koopman, Rob, and Shenghui Wang. 2014. “Where Should I Publish? Detecting Journal Similarity Based on What Has Been Published There.” In Proceedings of Digital Libraries 2014, 483–484. London, United Kingdom. Association for Computing Machinery. Paper, Poster • Koopman, Rob, Shenghui Wang, Andrea Scharnhorst, and Gwenn Englebienne. 2015. “Ariadne’s Thread — Interactive Navigation in a World of Networked Information”. In CHI '15 Extended Abstracts on Human Factors in Computing Systems. ACM, Seoul, South Korea. Paper, Poster • Koopman, Rob, Shenghui Wang and Andrea Scharnhorst. 2015. “Contextualization of topics - browsing through terms, authors, journals and cluster allocations”. In Proceedings of 15th International Conference on Scientometrics & Informetrics. Istanbul, Turkey. Paper
  16. 16. Explore. Share. Magnify. Thank you Shenghui Wang Rob Koopman OCLC Research EMEA shenghui.wang@oclc.org rob.koopman@oclc.org

×