Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Visual Search for Supporting Content Exploration in Large Document Collections

52 views

Published on

Slides for a presentation at the First International Workshop on Mining Scientific Publications @ JCDL 2012
Paper: http://dlib.org/dlib/july12/herrmannova/07herrmannova.html

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Visual Search for Supporting Content Exploration in Large Document Collections

  1. 1. 1/48 Visual search for supporting content exploration in large document collections Drahomira Herrmannova and Petr Knoth
  2. 2. 2/48 Contents • What do we do • Information Visualisations and Visual Search Interfaces • Our approach • Conclusion
  3. 3. 3/48 Contents • What do we do • Information Visualisations and Visual Search Interfaces • Our approach • Conclusion
  4. 4. 4/48 What do we do • Improve search in (large) document collections • Examples of collections: – News articles – Cultural heritage collection – Collection of scientific papers • Current search engines: – Support for lookup – Much less support for exploration
  5. 5. 5/48 Search tasks (Rose and Levinson, 2004) • Undirected (or exploratory) queries – significant portion of all searches (Rose and Levinson, 2004)
  6. 6. 6/48 Exploratory search (Marchionini, 2006)
  7. 7. 7/48 How to support exploratory search • One possible solution – information visualisation • Why? – Easier to communicate structure, organisation and relations in content – Visually appealing
  8. 8. 8/48 Contents • What do we do • Information Visualisations and Visual Search Interfaces • Our approach • Conclusion
  9. 9. 9/48 Information Visualisation (1/2) • Division according to granularity of information – Collection level – Document level – Intra-document level
  10. 10. 10/48 Collection level visualisations • Visualise attributes of the collection • Typically aim at providing a general overview of the collection content • Examples
  11. 11. 11/48 Tag clouds (Montero and Solana, 2006)
  12. 12. 12/48 TIARA (Wei et al., 2010)
  13. 13. 13/48 GRIDL (Schneiderman et al., 2000)
  14. 14. 14/48 Document level visualisations • Visualise attributes of the collection items • Mutual links and relations of collection items • Examples
  15. 15. 15/48 Hopara (Milne and Witten, 2011)
  16. 16. 16/48 Wivi (Lehmann et al., 2010)
  17. 17. 17/48 Apolo (Chau et al., 2011)
  18. 18. 18/48 Intra-document level visualisations • Visualise the internal structure of a document • Example
  19. 19. 19/48 TileBars (Hirst, 1995)
  20. 20. 20/48 Information Visualisation (2/2) • Division according to the “starting point” of the visualisation – Browsing focused – Query focused
  21. 21. 21/48 Browsing focused • Exploration starts at a specific point in the collection from which the user navigates through the collection • Usually the same starting point is used every time
  22. 22. 22/48 InfoSky (Granitzer et al., 2004)
  23. 23. 23/48 Query focused • Starts with a query • The query determines the entry point from which the exploration starts
  24. 24. 24/48 ThinkPedia (Hirsch et al., 2009)
  25. 25. 25/48 Our approach • Document level information • Query focused browsing
  26. 26. 26/48 Design principles (1/2) • For visual search interfaces • Should be considered when designing the interface • Related studies: – Chen and Yu, 2000 – Sebrechts et al., 1999
  27. 27. 27/48 Design principles (2/2) 1. Added value 2. Simplicity 3. Visual legibility 4. Use of colours 5. Dimension 6. Fixed spatial location
  28. 28. 28/48 Contents • What do we do • Information Visualisations and Visual Search Interfaces • Our approach • Conclusion
  29. 29. 29/48 Considered types of collections • Every document in a collection defined according to a set of dimensions • Dimensions typically of different types • Document = set of properties expressing values of dimensions • Dimensions always present • Examples
  30. 30. 30/48 News articles collection • Dimensions: – Time – Themes – Locations – Relations to other articles
  31. 31. 31/48 Cultural heritage artifacts • Dimensions: – Artifact type – Historical period – Style – Material
  32. 32. 32/48 Scientific papers • Dimensions: – Citations – Authors – Concepts – Similarities with other articles
  33. 33. 33/48 The visualisation
  34. 34. 34/48 The visualisation
  35. 35. 35/48 The visualisation
  36. 36. 36/48 The visualisation
  37. 37. 37/48 Discovering connections
  38. 38. 38/48 Comparing and contrasting documents
  39. 39. 39/48 Limitations • In theory not restricted, the limitations might be: – the size and resolution of the screen – the limitations of human perception
  40. 40. 40/48 Contents • What do we do • Information Visualisations and Visual Search Interfaces • Our approach • Conclusion
  41. 41. 41/48 Conclusion (1/2) • Motivation: 1. Provide better support for exploratory search than current textual interfaces 2. Interface that is conceptually applicable in any document collection regardless of its type 3. Provide an added value by assisting in the discovery of interesting connections that would otherwise remain hidden
  42. 42. 42/48 Conclusion (2/2) • Results: 1. Support for comparing and contrasting content. 2. Support for exploration across dimensions. 3. Universal approach to the visualised dimensions.
  43. 43. 43/48 Future plans • Planned release end of June • Integration with CORE system • Evaluation
  44. 44. 44/48 References (1/4) • G. Marchionini. Exploratory search: from finding to understanding. Communications of the ACM - Supporting exploratory search. 2006. • D. Rose & D. Levinson. Understanding user goals in web search. Proceedings of the 13th conference on World Wide Web. 2004. • Yusef Hassan-Montero and Victor Herrero-Solana. Improving tag-clouds as visual information retrieval interfaces. In MERIDA, INSCIT2006 CONFERENCE. 2006. • Furu Wei, Shixia Liu, Yangqiu Song, Shimei Pan, Michelle X. Zhou, Weihong Qian, Lei Shi, Li Tan, and Qiang Zhang. Tiara: a visual exploratory text an- alytic system. In Proceedings of the 16th ACMSIGKDD international conference on Knowledge discovery and data mining. 2010.
  45. 45. 45/48 References (2/4) • Ben Shneiderman, David Feldman, Anne Rose, and Xavier Ferré Grau. Visualizing digital librarysearch results with categorical and hierarchical axes. In Proceedings of the fifth ACM conference on Digital libraries. 2000. • Marti A. Hearst. TileBars: Visualization of Term Distribution Information in Full Text Information Access. In the Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems. 1995. • David Milne, Ian Witten. A link-based visual search engine for Wikipedia. Proceeding of the 11th annual international ACM/IEEE joint conference on Digital libraries. 2011. • Simon Lehmann, Ulrich Schwanecke, and Rolf Dorner. Interactive visualization for opportunistic exploration of large document collections. Information Systems. 2010.
  46. 46. 46/48 References (3/4) • Duen Horng Chau, Aniket Kittur, Jason I. Hong, and Christos Faloutsos. Apolo: making sense of large network data by combining rich user interaction and machine learning. In Proceedings of the 2011 annual conference on Human factors in computing systems. 2011. • Michael Granitzer, Wolfgang Kienreich, Vedran Sabol, Keith Andrews, and Werner Klieber. Evaluating a system for interactive exploration of large, hierarchically structured document repositories. In Proceedings of the IEEE Symposium on Information Visualization. 2004. • Christian Hirsch, John Hosking, and John Grundy. Interactive visualization tools for exploring the semantic graph of large knowledge spaces. Interfaces. 2009.
  47. 47. 47/48 References (4/4) • Chaomei Chen and Yue Yu. Empirical studies of information visualization: a meta-analysis. Int. J. Hum.- Comput. Stud. 2000. • Marc M. Sebrechts, John V. Cugini, Sharon J. Laskowski, Joanna Vasilakis, and Michael S. Miller. Visualization of search results: a comparative evaluation of text, 2d, and 3d interfaces. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 1999.
  48. 48. 48/48 Thanks for listening! Questions?

×