Successfully reported this slideshow.

Improving Search Engines using Online Communities

815 views

Published on

Anatoliy Gruzd
Research Forum, Graduate School of Library and Information Science, University of Illinois, Urbana-Champaign, IL
March 14, 2007

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Improving Search Engines using Online Communities

  1. 1. Improving Search Engines using Online Communities Anatoliy Gruzd <agruzd2@uiuc.edu> Research Forum Graduate School of Library and Information Science University of Illinois, Urbana-Champaign, IL March 14, 2007 It takes an [Internet] village …
  2. 2. Agenda <ul><li>Common search problems </li></ul><ul><li>Online bookmarking - http://del.icio.us </li></ul><ul><li>Pilot Study </li></ul><ul><li>Future work </li></ul>
  3. 3. Common search problems <ul><li>The main drawback of all modern search engines is that they force the user to guess words that might appear in all relevant documents and at the same time will not appear in NON-relevant documents. </li></ul><ul><li>A relevant page will not be retrieved, if it does not contain keywords that the user chose for searching. </li></ul><ul><li>2. Even If user’s search keywords are found inside a web page, it does not always mean that the page is relevant to the user. </li></ul>
  4. 4. Query#1: weight loss User’s Query Web page Matching Results weight loss weight loss ??? Architecture of a typical search engine
  5. 5. Query#1: weight loss <ul><li>http://www.paleofood.com/ </li></ul><ul><ul><li>Recipes are: grain-free, bean-free, potato-free, dairy - free, and sugar-free. </li></ul></ul>
  6. 6. Query#2: assignment about &quot;human brain&quot; for homeschooling This is an instructor’s blog for a Human Development class in the Evergreen State College. The page was retrieved because of two unrelated postings titled “ Homeschoolers use selective socialization” and “ Part Of Human Brain Functions Like A Digital Computer”.
  7. 7. Agenda <ul><li>Common search problems </li></ul><ul><li>Online bookmarking - http://del.icio.us </li></ul><ul><li>Pilot Study </li></ul><ul><li>Future work </li></ul>
  8. 9. username
  9. 10. C ommon T ags for http://www.paleofood.com/ <ul><li>ethnic </li></ul><ul><li>evolutionary eating </li></ul><ul><li>food </li></ul><ul><li>allergies </li></ul><ul><li>german </li></ul><ul><li>naturopathic </li></ul><ul><li>primitivism </li></ul><ul><li>weight loss </li></ul>Tag Tag Tag
  10. 11. User’s Query Web page Matching Results weight loss weight loss ??? Tags
  11. 12. Agenda <ul><li>Common search problems </li></ul><ul><li>Online bookmarking - http://del.icio.us </li></ul><ul><li>Pilot Study </li></ul><ul><li>Future work </li></ul>
  12. 13. Pilot Study User’s Query Web page Matching Results A Tags Matching Results B System A System B
  13. 14. Pilot Study <ul><li>Search engine </li></ul><ul><ul><li>I ndri , a cooperative effort between the University of Massachusetts and Carnegie Mellon University </li></ul></ul><ul><li>Search queries </li></ul><ul><ul><li>~20-30 Users’ real questions found on the Internet </li></ul></ul><ul><li>Pilot dataset </li></ul><ul><ul><li>454 health-related web pages </li></ul></ul>
  14. 15. “ The Open Directory Project is the largest, most comprehensive human-edited directory of the Web. ” http:// dmoz.org Started with ~ 64,000 URLs (from Top/Health/Conditions_and_Diseases ) -> only 544 are bookmarked by del.icio.us users -> only 454 were accessible at the time of my experiment Pilot dataset : 454 health-related web pages /Digestive_Disorders 23 /Respiratory_Disorde 26 /Cardiovascular_Disorders 35 /Endocrine_Disorders 53 /Immune_Disorders/Immune_Deficiency 54 / Cancer 101 / Neurological_Disorders 115
  15. 16. N oise in T ags <ul><li>toread </li></ul><ul><li>todo </li></ul><ul><li>interesting </li></ul><ul><li>imported </li></ul><ul><li>safari_export </li></ul><ul><li>system:unfiled </li></ul><ul><li>.imported </li></ul>
  16. 17. Compound tags <ul><li>g eneral health </li></ul><ul><li>c omputer software </li></ul><ul><li>cancer patients - support groups </li></ul><ul><li>h igh blood pressure </li></ul><ul><li>who i want to share with </li></ul>
  17. 18. Tags-based Keywords-based <ul><li>(+++) Neuroscience For Kids - Explore the nervous system </li></ul><ul><li>(+++) </li></ul><ul><li>(+++) </li></ul><ul><li>(---) / term &quot;assignment&quot; </li></ul><ul><li>(---) / term &quot;brain [center]&quot; </li></ul><ul><li>(+++) Neuroscience For Kids - Explore the nervous system </li></ul>homeschool human medical reference education cognitive biology psychology anatomy Common tag s Web page Matching Results A System A Tags Matching Results B System B
  18. 19. Agenda <ul><li>Common search problems </li></ul><ul><li>Online bookmarking - http://del.icio.us </li></ul><ul><li>Pilot Study </li></ul><ul><li>Future work </li></ul>
  19. 20. Future work <ul><li>Use a larger dataset </li></ul><ul><li>Compare results across different subject domains and genres </li></ul><ul><li>Explore ways to combine tags and keywords to determine whether it will improve the quality of results (if at all) </li></ul>

×