SlideShare a Scribd company logo
Measuring the quality of web search engines
Prof. Dr. Dirk Lewandowski
University of Applied Sciences Hamburg
dirk.lewandowski@haw-hamburg.de


Tartu University, 14 September 2009
Agenda



 Introduction

 A few words about user behaviour

 Standard retrieval effectiveness tests vs. “Universal Search”

 Selected results: Results descriptions, navigational queries

 Towards an integrated test framework

 Conclusions

1 | Dirk Lewandowski
Agenda



 Introduction

 A few words about user behaviour

 Standard retrieval effectiveness tests vs. “Universal Search”

 Selected results: Results descriptions, navigational queries

 Towards an integrated test framework

 Conclusions

2 | Dirk Lewandowski
Search engine market: Germany 2009




3 | Dirk Lewandowski                  (Webhits, 2009)
Search engine market: Estonia 2007




4 | Dirk Lewandowski                  (Global Search Report 2007)
Why measure the quality of web search engines?




 • Search engines are the main access point to web content.

 • One player is dominating the worldwide market.

 • Open questions
    – How good are search engines’ results?
    – Do we need alternatives to “big three” (“big two”? “big one”?)
    – How good are alternative search engines in delivering an alternative view on web
      content?
    – How good must a new search engine be to compete?




5 | Dirk Lewandowski
A framework for measuring search engine quality



 • Index quality
     – Size of database, coverage of the web
     – Coverage of certain areas (countries, languages)
     – Index overlap
     – Index freshness

 • Quality of the results
    – Retrieval effectiveness
    – User satisfaction
    – Results overlap

 • Quality of the search features
    – Features offered
    – Operational reliability

 • Search engine usability and user guidance
                                                          (Lewandowski & Höchstötter, 2007)
6 | Dirk Lewandowski
A framework for measuring search engine quality



 • Index quality
     – Size of database, coverage of the web
     – Coverage of certain areas (countries, languages)
     – Index overlap
     – Index freshness

 • Quality of the results
    – Retrieval effectiveness
    – User satisfaction
    – Results overlap

 • Quality of the search features
    – Features offered
    – Operational reliability

 • Search engine usability and user guidance
                                                          (Lewandowski & Höchstötter, 2007)
7 | Dirk Lewandowski
Agenda



 Introduction

 A few words about user behaviour

 Standard retrieval effectiveness tests vs. “Universal Search”

 Selected results: Results descriptions, navigational queries

 Towards an integrated test framework

 Conclusions

8 | Dirk Lewandowski
Users use relatively few cognitive resources in web searching.




 • Queries
    – Average length: 1.7 words (German language queries; English language queries
      slightly longer)
    – Approx. 50 percent of queries consist of just one word

 • Search engine results pages (SERPs)
    – 80 percent of users view no more than the first results page (10 results)
    – Users normally only view the first few results („above the fold“)
    – Users only view up to five results per session
    – Session length is less than 15 minutes

 • Users are usually satisfied with the results given.


9 | Dirk Lewandowski
Results selection (top11 results)




                                     (Granka et al. 2004)



10 | Dirk Lewandowski
Agenda



 Introduction

 A few words about user behaviour

 Standard retrieval effectiveness tests vs. “Universal Search”

 Selected results: Results descriptions, navigational queries

 Towards an integrated test framework

 Conclusions

11 | Dirk Lewandowski
Standard design for retrieval effectiveness tests




 •   Select (at least 50) queries (from log files, from user studies, etc.)
 •   Select some (major) search engines
 •   Consider top results (use cut-off)
 •   Anonymise search engines, randomise results positions
 •   Let users judge results

 • Calculate precision scores
    – the ratio of relevant results in proportion to all results retrieved at the
      corresponding position
 • Calculate/assume recall scores
    – the ratio of relevant results shown by a certain search engine in proportion to all
      relevant results within the database.


12 | Dirk Lewandowski
Recall-Precision-Graph (top20 results)




13 | Dirk Lewandowski                     (Lewandowski 2008)
Standard design for retrieval effectiveness tests


 • Problematic assumptions
    – Model of “dedicated searcher” (willing to select one result after the other and go
      through an extensive list of results)
    – User wants high precision and high recall, as well.

 • These studies do not consider
    – how many documents a user is willing to view / how many are sufficient for
      answering the query
    – how popular the queries used in the evaluation are
    – graded relevance judgements (relevance scales)
    – different relevance judgements by different jurors
    – different query types
    – results descriptions
    – users’ typical results selection behaviour
    – visibility of different elements in the results lists (through their presentation)
    – users’ preference for a certain search engine
    – diversity of the results set / the top results
    – ...
14 | Dirk Lewandowski
• Results selection simple




       15 | Dirk Lewandowski
Universal Search




 • x




16 | Dirk Lewandowski
Universal Search
                         News
                        results



                                                     ads
 • x



                           organic results




                           image results


                           video results
17 | Dirk Lewandowski

                          organic results (contd.)
Agenda



 Introduction

 A few words about user behaviour

 Standard retrieval effectiveness tests vs. “Universal Search”

 Selected results: Results descriptions, navigational queries

 Towards an integrated test framework

 Conclusions

18 | Dirk Lewandowski
Results descriptions



                        META Description



                         Yahoo Directory




                          Open Directory




19 | Dirk Lewandowski
Results decriptions: keywords in context (KWIC)




20 | Dirk Lewandowski
• Results selection simple




       21 | Dirk Lewandowski
• results selection with descriptions




       22 | Dirk Lewandowski
Ratio of relevant results vs. relevant descriptions (top20 results)




23 | Dirk Lewandowski
Recall-precision graph (top20 descriptions)




24 | Dirk Lewandowski
Precision of descriptions vs. precision of results (Google)




25 | Dirk Lewandowski
Recall-Precision-Graph (Top20, DRprec = relevant descriptions
 leading to relevant results)




26 | Dirk Lewandowski
Search engines deal with different query types.




 Query types (Broder, 2002):

 • Informational
     – Looking for information on a certain topic
     – User wants to view a few relevant pages

 • Navigational
    – Looking for a (known) homepage
    – User wants to navigate to this homepage, only one relevant result

 • Transactional
     – Looking for a website to complete a transaction
     – One or more relevant results
     – Transaction can be purchasing a product, downloading a file, etc.
27 | Dirk Lewandowski
Search engines deal with different query types.




 Query types (Broder, 2002):

 • Informational
     – Looking for information on a certain topic
     – User wants to view a few relevant pages

 • Navigational
    – Looking for a (known) homepage
    – User wants to navigate to this homepage, only one relevant result

 • Transactional
     – Looking for a website to complete a transaction
     – One or more relevant results
     – Transaction can be purchasing a product, downloading a file, etc.
28 | Dirk Lewandowski
Percentage of unanswered queries (“navigational fail”)




29 | Dirk Lewandowski                                (Lewandowski 2009)
Successful answered queries on results position n




30 | Dirk Lewandowski                                (Lewandowski 2009)
Results for navigational vs. informational queries




 • Studies should consider informational, as well as navigational queries.

 • Queries should be weighted according to their frequency.

 • When >40% of queries are navigational, new search engines should put
   significant effort in answering these queries sufficiently.




31 | Dirk Lewandowski
Agenda



 Introduction

 A few words about user behaviour

 Standard retrieval effectiveness tests vs. “Universal Search”

 Selected results: Results descriptions, navigational queries

 Towards an integrated test framework

 Conclusions

32 | Dirk Lewandowski
Addressing major problems with retrieval effectiveness tests




 • We use navigational and informational queries, as well.
    – no suitable framework for transactional queries, though.

 • We use query frequency data from the T-Online database.
    – The database consists of approx. 400 million queries from 2007 onwards.
    – We can use time series analysis.

 • We classify queries according to query type and topic.
    – We did a study on query classification based on 50,000 queries from T-Online log
      files to gain a better understanding of user intents. Data collection was
      “crowdsourced” to Humangrid GmbH.




33 | Dirk Lewandowski
Addressing major problems with retrieval effectiveness tests


 • We consider all elements on the first results page.
    – Organic results, ads, shortcuts
    – We will use clickthrough data from T-Online to measure “importance” of certain
      results.

 • Each result will be judged by several jurors.
    – Juror groups: Students, professors, retired persons, librarians, school children,
      other.
    – Additional judgements by the “general users” are collected in cooperation with
      Humangrid GmbH.

 • Results will be graded on a relevance scale.
    – Results and descriptions will be getting judged.

 • We will classify all organic results according to
    – document type (e.g., encyclopaedia, blog, forum, news)
    – date
    – degree of commercial intent
34 | Dirk Lewandowski
Addressing major problems with retrieval effectiveness tests




 • We will count ads on results pages
    – Do search engines prefer pages carrying ads from the engine’s ad system?

 • We will ask users additional questions
    – Users will also judge the results set of each individual search engine as a whole.
    – Users will rank search engine based on the result sets.
    – Users will say where they would have stopped viewing more results.
    – Users will provide their own individual relevance-ranked list by card-sorting the
      complete results set from all search engines.

 • We will use printout screenshots of the results
    – Makes the study “mobile”
    – Especially important when considering certain user groups (e.g., elderly people).

35 | Dirk Lewandowski
State of current work




 • First wave of data collection starting in October.

 • Proposal for additional project funding sent to DFG (German Research
   Foundation).

 • Project on user intents from search queries near completion.

 • Continuing collaboration with Deutsche Telekom, T-Online.




36 | Dirk Lewandowski
Agenda



 Introduction

 A few words about user behaviour

 Standard retrieval effectiveness tests vs. “Universal Search”

 Selected results: Results descriptions, navigational queries

 Towards an integrated test framework

 Conclusions

37 | Dirk Lewandowski
Conclusion




 • Measuring search engine quality is a complex task.

 • Retrieval effectiveness is a major aspect of SE quality evaluation.

 • Established evaluation frameworks are not sufficient for the web context.




38 | Dirk Lewandowski
Thank you for your attention.
Prof. Dr.
Dirk Lewandowski

Hamburg University of Applied Sciences
Department Information
Berliner Tor 5
D - 20099 Hamburg
Germany



www.bui.haw-hamburg.de/lewandowski.html
E-Mail: dirk.lewandowski@haw-hamburg.de

More Related Content

Similar to Measuring the quality of web search engines

How can library materials be ranked in the OPAC?
How can library materials be ranked in the OPAC?How can library materials be ranked in the OPAC?
How can library materials be ranked in the OPAC?
Dirk Lewandowski
 
Alternatives to Google
Alternatives to GoogleAlternatives to Google
Alternatives to Google
Dirk Lewandowski
 
Combining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept LocationCombining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept Location
Sonia Haiduc
 
Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...
Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...
Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...
Mounia Lalmas-Roelleke
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
enterprisesearchmeetup
 
Evaluation of web scale discovery services
Evaluation of web scale discovery servicesEvaluation of web scale discovery services
Evaluation of web scale discovery services
Nikesh Narayanan
 
Danforth Media Capabilities
Danforth Media CapabilitiesDanforth Media Capabilities
Danforth Media Capabilities
Danforth
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
Alan Said
 
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?
Toine Bogers
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Simon Hughes
 
Rise 2014 st requier
Rise 2014 st requierRise 2014 st requier
Rise 2014 st requier
YSaidali
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Abdel Salam Sayyad
 
Charles Rygula: Value Beyond Words
Charles Rygula: Value Beyond WordsCharles Rygula: Value Beyond Words
Charles Rygula: Value Beyond Words
Jack Molisani
 
TLC2018 Thomas Haver: Transform with Enterprise Automation
TLC2018 Thomas Haver: Transform with Enterprise AutomationTLC2018 Thomas Haver: Transform with Enterprise Automation
TLC2018 Thomas Haver: Transform with Enterprise Automation
Anna Royzman
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
Elasticsearch
 
Analysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topicsAnalysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topics
Dirk Lewandowski
 
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter eventUsability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
Kay Aubrey
 
ODDC at ICTD2013 - Research methods discussion - Web Analytics
 ODDC at ICTD2013 - Research methods discussion - Web Analytics ODDC at ICTD2013 - Research methods discussion - Web Analytics
ODDC at ICTD2013 - Research methods discussion - Web Analytics
Open Data Research Network
 
Mining Testing Questions on Stack Overflow
Mining Testing Questions on Stack OverflowMining Testing Questions on Stack Overflow
Mining Testing Questions on Stack Overflow
Pavneet Singh Kochhar
 
Supercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX AnalyticsSupercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX Analytics
UserZoom
 

Similar to Measuring the quality of web search engines (20)

How can library materials be ranked in the OPAC?
How can library materials be ranked in the OPAC?How can library materials be ranked in the OPAC?
How can library materials be ranked in the OPAC?
 
Alternatives to Google
Alternatives to GoogleAlternatives to Google
Alternatives to Google
 
Combining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept LocationCombining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept Location
 
Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...
Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...
Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
Evaluation of web scale discovery services
Evaluation of web scale discovery servicesEvaluation of web scale discovery services
Evaluation of web scale discovery services
 
Danforth Media Capabilities
Danforth Media CapabilitiesDanforth Media Capabilities
Danforth Media Capabilities
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
Rise 2014 st requier
Rise 2014 st requierRise 2014 st requier
Rise 2014 st requier
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
 
Charles Rygula: Value Beyond Words
Charles Rygula: Value Beyond WordsCharles Rygula: Value Beyond Words
Charles Rygula: Value Beyond Words
 
TLC2018 Thomas Haver: Transform with Enterprise Automation
TLC2018 Thomas Haver: Transform with Enterprise AutomationTLC2018 Thomas Haver: Transform with Enterprise Automation
TLC2018 Thomas Haver: Transform with Enterprise Automation
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
Analysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topicsAnalysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topics
 
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter eventUsability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
 
ODDC at ICTD2013 - Research methods discussion - Web Analytics
 ODDC at ICTD2013 - Research methods discussion - Web Analytics ODDC at ICTD2013 - Research methods discussion - Web Analytics
ODDC at ICTD2013 - Research methods discussion - Web Analytics
 
Mining Testing Questions on Stack Overflow
Mining Testing Questions on Stack OverflowMining Testing Questions on Stack Overflow
Mining Testing Questions on Stack Overflow
 
Supercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX AnalyticsSupercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX Analytics
 

More from Dirk Lewandowski

The Need for and fundamentals of an Open Web Index
The Need for and fundamentals of an Open Web IndexThe Need for and fundamentals of an Open Web Index
The Need for and fundamentals of an Open Web Index
Dirk Lewandowski
 
In a World of Biased Search Engines
In a World of Biased Search EnginesIn a World of Biased Search Engines
In a World of Biased Search Engines
Dirk Lewandowski
 
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
Dirk Lewandowski
 
Künstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei SuchmaschinenKünstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei Suchmaschinen
Dirk Lewandowski
 
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändertGoogle Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Dirk Lewandowski
 
Suchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von SuchdienstenSuchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von Suchdiensten
Dirk Lewandowski
 
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Dirk Lewandowski
 
Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?
Dirk Lewandowski
 
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Dirk Lewandowski
 
Wie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretierenWie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretieren
Dirk Lewandowski
 
Perspektiven eines Open Web Index
Perspektiven eines Open Web IndexPerspektiven eines Open Web Index
Perspektiven eines Open Web Index
Dirk Lewandowski
 
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Dirk Lewandowski
 
Suchmaschinen verstehen
Suchmaschinen verstehenSuchmaschinen verstehen
Suchmaschinen verstehen
Dirk Lewandowski
 
Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Dirk Lewandowski
 
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenInternet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenDirk Lewandowski
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Dirk Lewandowski
 
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenVerwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenDirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Dirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Dirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Dirk Lewandowski
 

More from Dirk Lewandowski (20)

The Need for and fundamentals of an Open Web Index
The Need for and fundamentals of an Open Web IndexThe Need for and fundamentals of an Open Web Index
The Need for and fundamentals of an Open Web Index
 
In a World of Biased Search Engines
In a World of Biased Search EnginesIn a World of Biased Search Engines
In a World of Biased Search Engines
 
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
 
Künstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei SuchmaschinenKünstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei Suchmaschinen
 
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändertGoogle Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
 
Suchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von SuchdienstenSuchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von Suchdiensten
 
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
 
Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?
 
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
 
Wie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretierenWie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretieren
 
Perspektiven eines Open Web Index
Perspektiven eines Open Web IndexPerspektiven eines Open Web Index
Perspektiven eines Open Web Index
 
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
 
Suchmaschinen verstehen
Suchmaschinen verstehenSuchmaschinen verstehen
Suchmaschinen verstehen
 
Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?
 
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenInternet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
 
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenVerwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
 

Recently uploaded

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 

Recently uploaded (20)

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 

Measuring the quality of web search engines

  • 1. Measuring the quality of web search engines Prof. Dr. Dirk Lewandowski University of Applied Sciences Hamburg dirk.lewandowski@haw-hamburg.de Tartu University, 14 September 2009
  • 2. Agenda Introduction A few words about user behaviour Standard retrieval effectiveness tests vs. “Universal Search” Selected results: Results descriptions, navigational queries Towards an integrated test framework Conclusions 1 | Dirk Lewandowski
  • 3. Agenda Introduction A few words about user behaviour Standard retrieval effectiveness tests vs. “Universal Search” Selected results: Results descriptions, navigational queries Towards an integrated test framework Conclusions 2 | Dirk Lewandowski
  • 4. Search engine market: Germany 2009 3 | Dirk Lewandowski (Webhits, 2009)
  • 5. Search engine market: Estonia 2007 4 | Dirk Lewandowski (Global Search Report 2007)
  • 6. Why measure the quality of web search engines? • Search engines are the main access point to web content. • One player is dominating the worldwide market. • Open questions – How good are search engines’ results? – Do we need alternatives to “big three” (“big two”? “big one”?) – How good are alternative search engines in delivering an alternative view on web content? – How good must a new search engine be to compete? 5 | Dirk Lewandowski
  • 7. A framework for measuring search engine quality • Index quality – Size of database, coverage of the web – Coverage of certain areas (countries, languages) – Index overlap – Index freshness • Quality of the results – Retrieval effectiveness – User satisfaction – Results overlap • Quality of the search features – Features offered – Operational reliability • Search engine usability and user guidance (Lewandowski & Höchstötter, 2007) 6 | Dirk Lewandowski
  • 8. A framework for measuring search engine quality • Index quality – Size of database, coverage of the web – Coverage of certain areas (countries, languages) – Index overlap – Index freshness • Quality of the results – Retrieval effectiveness – User satisfaction – Results overlap • Quality of the search features – Features offered – Operational reliability • Search engine usability and user guidance (Lewandowski & Höchstötter, 2007) 7 | Dirk Lewandowski
  • 9. Agenda Introduction A few words about user behaviour Standard retrieval effectiveness tests vs. “Universal Search” Selected results: Results descriptions, navigational queries Towards an integrated test framework Conclusions 8 | Dirk Lewandowski
  • 10. Users use relatively few cognitive resources in web searching. • Queries – Average length: 1.7 words (German language queries; English language queries slightly longer) – Approx. 50 percent of queries consist of just one word • Search engine results pages (SERPs) – 80 percent of users view no more than the first results page (10 results) – Users normally only view the first few results („above the fold“) – Users only view up to five results per session – Session length is less than 15 minutes • Users are usually satisfied with the results given. 9 | Dirk Lewandowski
  • 11. Results selection (top11 results) (Granka et al. 2004) 10 | Dirk Lewandowski
  • 12. Agenda Introduction A few words about user behaviour Standard retrieval effectiveness tests vs. “Universal Search” Selected results: Results descriptions, navigational queries Towards an integrated test framework Conclusions 11 | Dirk Lewandowski
  • 13. Standard design for retrieval effectiveness tests • Select (at least 50) queries (from log files, from user studies, etc.) • Select some (major) search engines • Consider top results (use cut-off) • Anonymise search engines, randomise results positions • Let users judge results • Calculate precision scores – the ratio of relevant results in proportion to all results retrieved at the corresponding position • Calculate/assume recall scores – the ratio of relevant results shown by a certain search engine in proportion to all relevant results within the database. 12 | Dirk Lewandowski
  • 14. Recall-Precision-Graph (top20 results) 13 | Dirk Lewandowski (Lewandowski 2008)
  • 15. Standard design for retrieval effectiveness tests • Problematic assumptions – Model of “dedicated searcher” (willing to select one result after the other and go through an extensive list of results) – User wants high precision and high recall, as well. • These studies do not consider – how many documents a user is willing to view / how many are sufficient for answering the query – how popular the queries used in the evaluation are – graded relevance judgements (relevance scales) – different relevance judgements by different jurors – different query types – results descriptions – users’ typical results selection behaviour – visibility of different elements in the results lists (through their presentation) – users’ preference for a certain search engine – diversity of the results set / the top results – ... 14 | Dirk Lewandowski
  • 16. • Results selection simple 15 | Dirk Lewandowski
  • 17. Universal Search • x 16 | Dirk Lewandowski
  • 18. Universal Search News results ads • x organic results image results video results 17 | Dirk Lewandowski organic results (contd.)
  • 19. Agenda Introduction A few words about user behaviour Standard retrieval effectiveness tests vs. “Universal Search” Selected results: Results descriptions, navigational queries Towards an integrated test framework Conclusions 18 | Dirk Lewandowski
  • 20. Results descriptions META Description Yahoo Directory Open Directory 19 | Dirk Lewandowski
  • 21. Results decriptions: keywords in context (KWIC) 20 | Dirk Lewandowski
  • 22. • Results selection simple 21 | Dirk Lewandowski
  • 23. • results selection with descriptions 22 | Dirk Lewandowski
  • 24. Ratio of relevant results vs. relevant descriptions (top20 results) 23 | Dirk Lewandowski
  • 25. Recall-precision graph (top20 descriptions) 24 | Dirk Lewandowski
  • 26. Precision of descriptions vs. precision of results (Google) 25 | Dirk Lewandowski
  • 27. Recall-Precision-Graph (Top20, DRprec = relevant descriptions leading to relevant results) 26 | Dirk Lewandowski
  • 28. Search engines deal with different query types. Query types (Broder, 2002): • Informational – Looking for information on a certain topic – User wants to view a few relevant pages • Navigational – Looking for a (known) homepage – User wants to navigate to this homepage, only one relevant result • Transactional – Looking for a website to complete a transaction – One or more relevant results – Transaction can be purchasing a product, downloading a file, etc. 27 | Dirk Lewandowski
  • 29. Search engines deal with different query types. Query types (Broder, 2002): • Informational – Looking for information on a certain topic – User wants to view a few relevant pages • Navigational – Looking for a (known) homepage – User wants to navigate to this homepage, only one relevant result • Transactional – Looking for a website to complete a transaction – One or more relevant results – Transaction can be purchasing a product, downloading a file, etc. 28 | Dirk Lewandowski
  • 30. Percentage of unanswered queries (“navigational fail”) 29 | Dirk Lewandowski (Lewandowski 2009)
  • 31. Successful answered queries on results position n 30 | Dirk Lewandowski (Lewandowski 2009)
  • 32. Results for navigational vs. informational queries • Studies should consider informational, as well as navigational queries. • Queries should be weighted according to their frequency. • When >40% of queries are navigational, new search engines should put significant effort in answering these queries sufficiently. 31 | Dirk Lewandowski
  • 33. Agenda Introduction A few words about user behaviour Standard retrieval effectiveness tests vs. “Universal Search” Selected results: Results descriptions, navigational queries Towards an integrated test framework Conclusions 32 | Dirk Lewandowski
  • 34. Addressing major problems with retrieval effectiveness tests • We use navigational and informational queries, as well. – no suitable framework for transactional queries, though. • We use query frequency data from the T-Online database. – The database consists of approx. 400 million queries from 2007 onwards. – We can use time series analysis. • We classify queries according to query type and topic. – We did a study on query classification based on 50,000 queries from T-Online log files to gain a better understanding of user intents. Data collection was “crowdsourced” to Humangrid GmbH. 33 | Dirk Lewandowski
  • 35. Addressing major problems with retrieval effectiveness tests • We consider all elements on the first results page. – Organic results, ads, shortcuts – We will use clickthrough data from T-Online to measure “importance” of certain results. • Each result will be judged by several jurors. – Juror groups: Students, professors, retired persons, librarians, school children, other. – Additional judgements by the “general users” are collected in cooperation with Humangrid GmbH. • Results will be graded on a relevance scale. – Results and descriptions will be getting judged. • We will classify all organic results according to – document type (e.g., encyclopaedia, blog, forum, news) – date – degree of commercial intent 34 | Dirk Lewandowski
  • 36. Addressing major problems with retrieval effectiveness tests • We will count ads on results pages – Do search engines prefer pages carrying ads from the engine’s ad system? • We will ask users additional questions – Users will also judge the results set of each individual search engine as a whole. – Users will rank search engine based on the result sets. – Users will say where they would have stopped viewing more results. – Users will provide their own individual relevance-ranked list by card-sorting the complete results set from all search engines. • We will use printout screenshots of the results – Makes the study “mobile” – Especially important when considering certain user groups (e.g., elderly people). 35 | Dirk Lewandowski
  • 37. State of current work • First wave of data collection starting in October. • Proposal for additional project funding sent to DFG (German Research Foundation). • Project on user intents from search queries near completion. • Continuing collaboration with Deutsche Telekom, T-Online. 36 | Dirk Lewandowski
  • 38. Agenda Introduction A few words about user behaviour Standard retrieval effectiveness tests vs. “Universal Search” Selected results: Results descriptions, navigational queries Towards an integrated test framework Conclusions 37 | Dirk Lewandowski
  • 39. Conclusion • Measuring search engine quality is a complex task. • Retrieval effectiveness is a major aspect of SE quality evaluation. • Established evaluation frameworks are not sufficient for the web context. 38 | Dirk Lewandowski
  • 40. Thank you for your attention. Prof. Dr. Dirk Lewandowski Hamburg University of Applied Sciences Department Information Berliner Tor 5 D - 20099 Hamburg Germany www.bui.haw-hamburg.de/lewandowski.html E-Mail: dirk.lewandowski@haw-hamburg.de