Successfully reported this slideshow.

Exploring a world of networked information built from free-text metadata

1

Share

1 of 19
1 of 19

More Related Content

Similar to Exploring a world of networked information built from free-text metadata

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Exploring a world of networked information built from free-text metadata

  1. 1. Shenghui Wang Rob Koopman Exploring a world of networked information built from free-text metadata OCLC Research EMEA ELAG2015
  2. 2. What would you do if you are interested in a topic?
  3. 3. Difficult to answer these questions: • What are the different aspects of this topic? • Are there related aspects missing in my search terms? • Who are the most prominent authors about this topic? • Which journals publish most about this topic? • How have others — e.g. librarians — described and classified this topic?
  4. 4. Demo • http://thoth.pica.nl/relate?input=opac
  5. 5. How do we do this? • OFFLINE: generates a semantic representation for each entity • ONLINE: finds the most related entities and using multidimensional scaling to display
  6. 6. Build semantic representation • Basic assumptions – Entities can be represented by its context – Entities which share more context are more likely to be related • Context is the textual environment where an entity occurs • The effects of state prekindergarten programs on young children’s school readiness in five states • [author:jung kwanghee] • [subject:readiness for school]
  7. 7. Dataset ● ArticleFirst, 65 million articles ● Selected 4 million entities (topical terms, authors, ISSNs, Dewey decimal codes) ● Represented by 1 million topical terms But a matrix of 4M x 1M is too big to process
  8. 8. Dimension reduction based on Random Projection C: a co-occurrence matrix R: a random matrix of +/-1 C’: approximation of C after random projection -- Semantic matrix
  9. 9. Online interface • Find mutual nearest neighbors • Use multidimensional scaling to display
  10. 10. Nearest neighbors
  11. 11. Mutual nearest neighbors
  12. 12. Possible applications • Explorative interface • Context based search: – brain • Journal finder – Arctic ice journals – http://brain.oxfordjournals.org/ • Author name disambiguation – pre kindergarten
  13. 13. Context matters! • What does “young” mean in - AritcleFirst - WorldCat - Astrophysics - Art
  14. 14. Ariadne (demo) http://thoth.pica.nl/relate • An extremely fast way of navigating large scale hetereogeneous entities • Generalisable to different datasets – Full WorldCat – Small but highly curated astrophysics dataset • Supports explorative information retrieval and entity disambiguation
  15. 15. References • Koopman, Rob, and Shenghui Wang. 2014. “Where Should I Publish? Detecting Journal Similarity Based on What Has Been Published There.” In Proceedings of Digital Libraries 2014, 483–484. London, United Kingdom. Association for Computing Machinery. Paper, Poster • Koopman, Rob, Shenghui Wang, Andrea Scharnhorst, and Gwenn Englebienne. 2015. “Ariadne’s Thread — Interactive Navigation in a World of Networked Information”. In CHI '15 Extended Abstracts on Human Factors in Computing Systems. ACM, Seoul, South Korea. Paper, Poster • Koopman, Rob, Shenghui Wang and Andrea Scharnhorst. 2015. “Contextualization of topics - browsing through terms, authors, journals and cluster allocations”. In Proceedings of 15th International Conference on Scientometrics & Informetrics. Istanbul, Turkey. Paper
  16. 16. Explore. Share. Magnify. Thank you Shenghui Wang Rob Koopman OCLC Research EMEA shenghui.wang@oclc.org rob.koopman@oclc.org

Editor's Notes

  • Opac -> journal -> author -> [author:medeiros norm] -> worldcat

    Ambiguous names: [author:balas janet l] [author:balas j l]
  • Journal finder
    Name disam
  • ×