0
The Economics inInteractive Information Retrieval                               Leif Azzopardi                    http://w...
InteractionCost          Benefit
Interactive and Iterative Search                     A simplified, abstracted, representationInformation    Need          ...
Observational & Empirical          Berry         Picking      IS&RASK                Framework                            ...
Theoretical & Formal                              A Major Research ChallengeInteractive Information Retrieval needs formal...
User queries tend to beshort (only 2-3 terms)              Web searchers typically                                    only...
So why do users pose short queries?User queries tend to be short  But longer queries tend to be more effective!
So why do users pose short queries?               0.5                                                          Exponential...
How can we use microeconomics to   model the search process?
MicroeconomicsProduction Theory Consumer TheoryUtility Maximization Cost Minimization
Production Theory              a.k.a. Theory of FirmsInputs                                     Output                   T...
Production FunctionsProduction Function     Capital                        Labor
Production FunctionsProduction Function      Quantity = F ( Capital, Labor )     Capital                                  ...
Production Functions                                 Production SetProduction Function     Capital                        ...
Production Functions                                  Production SetProduction Function     Capital                       ...
Applying Production Theory toInteractive Information Retrieval
Interactive and Iterative Search                     A simplified, abstracted, representationInformation    Need          ...
Search as ProductionInputs                                            Output                      The Firm Queries        ...
Search Production Function                                         The function represents how                            ...
What strategies can the user employ          when interacting with the search system to achieve their end goal            ...
Modeling Caveats     of an economic model of the search processAbstracted                                    SimplifiedRep...
What does the modeltell us about search & interaction?
Search Scenario                    Scenario• Task: Find news articles about ….• Goal: To find a number of relevant documen...
Simulating User Interaction                                                              Models:                          ...
Search Production Curves                            Same Retrieval Model, Different Gain                                  ...
Search Production Curves                           Different Retrieval Models, Same Gain                                  ...
Search Production Function                                          Cobbs-Douglas Production Function  No. of Assessments ...
Using the Cobbs-Douglas Search Function    We can differentiate the function to find the rates of change of the input vari...
Technical Rate of SubstitutionHow many more assessments per query are needed, if one less query was posed?                ...
What about the cost of    interaction?
User Search Cost Function                                                A linear cost function  No. of Assessments per qu...
Cost Efficient Strategies                 BM25 0.4 and 0.6 Gains                  50No. of Queries                  40    ...
Cost Efficient Strategies                                                 BOOL 0.4 & 0.6 Gains     On Boolean, to         ...
Contrasting Systems                       BM25 0.4 and 0.6 Gains                                 BOOL 0.4 and 0.6 Gains   ...
A Hypothetical Experiment                              What happens if       Querying     More                  Decrease i...
Changing the Relative Query Cost                                         c(Q, A) = b.Q +Q.A                               ...
Implications for Design• Knowing how benefit, interaction and cost  relate can help guide how we design systems  – We can ...
Future Directions            Future Directions• Validate the theory by conducting  observational & empirical research  – D...
Questions  Contact DetailsEmail: Leifos@acm.orgSkype: LeifosTwitter: @leifos
Selected References• Varian, H., Intermediate Microeconomics, 1987• Varian, H., Economics and Search, ACM SIGIR Forum, 199...
Search Production Function                                        Example                                      G = F( X, Y...
Search Production Function                                        Example application for web search                      ...
Upcoming SlideShare
Loading in...5
×

Azzopardi2012economics of iir_tech_talk

324

Published on

In this talk, I discuss how Micro-economics can be used to describe, explain and prediction the interactions of a user and information retrieval system. The work is based on the ACM SIGIR 2011 paper ( http://dl.acm.org/citation.cfm?id=2009923 ) and is available to download from: http://www.dcs.gla.ac.uk/~leif/papers/azzopardi2011economics.pdf

Published in: Technology, Economy & Finance
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
324
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • In this talk I will discuss how we can use micro-economics to describe how users interact with a retrieval system – essentially, I will model how the benefit/gain/performance a user obtains from a system, the interactions which they perform, and the cost of these interaction -
  • Note: related to this work is the work by Piriolli, Card and Ed Chi on Information Foraging Theory.
  • Belkin (2008) outlined some of the challenges within IIRJarvelin (2011) also argued the need to understand Info. Sys. Through the development of formal models and testable theories to describe the interaction b/w users and systems.It is a major research challenge because of all the complexities involved with users, their interactions with information and the systems that they employ.
  • So this provides an economic justification for posing short queries..
  • Microeconomics might give us the right tools to models IIR.we have build a formal model based on production theory from economics: which explains, predicts..etc. .An area that looks how to
  • A firm produces output (such as goods or services) A firm requires inputs (such as capital and labor)A firm utilizes some form of technology to then transform the inputs into outputs.
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • Inputs: the number of queries, the length of queries, the number of documents assessed per query, etc.Output: a number of relevant documents (or gain from the relevant information found).Technology used: a Search engine
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • And if we map a cost function to the interactions then we can ask, “what is the most cost-efficient way for a user to interact with an IR system?”What strategy should a user employ to achieve their goal?
  • What strategy should a user eYes, the model is abstracted and general. ….. Search has many more inputs and outputs. Lots more variables, but we have abstracted away these details. We have simplified the search process to two core variables that affect the output.But this doesn’t mean the model doesn’t have any explanatory powerRepresentative, but not necessarily wholly realisticemploy to achieve their goal?
  • Now that we have framed the search process as an economics problem, and we have an economic model that describes the output given the inputs and the technology, the big question is: WHAT CAN WE DO WITH IT?So to explore the application of this theory to IIR we perform an economic analysis of search
  • Airbus Subsidies byEuropean governmentsCases of Insider tradingTropical storms where people were killedSImulated interaction: i.e. to determine the minimum inputs for the desired output – and thus obtain the production function.
  • SImulated interaction: i.e. to determine the minimum inputs for the desired output – and thus obtain the production function.To explore the range of possible user strategies i.e. examine all the combinations of inputs .- Queries of length 3 were generated for each topic given the relevance documents. i.e. create high quality queries.Simulated Interaction: - A session was comprised of a series of queries, and a given assessment depth. The session ended when the desired gain was achieved.- Best-First approach to obtain an approximation for an empirical production function.
  • The blue and purple lines converge, because I stopped the simulation when ncg > 0.2, and not (ncg > 0.2 and <0.4). So that is why they converge i.e. by the time A = 200, and the same query is submitted the gain is the same, >0.2 and >0.4.I really should have fixed the simulation, and stopped when the gain was greater than 0.25 so the the production curve for 0.2 gain would stop at about A =75.
  • So far we have only examined empirical estimates of the production curves/functions. It would be good if we could fit a mathematical function to these curves to have well defined model.
  • So far we have empirically estimated the production function. However, it is common in economics to fit a functional form to the production function – so that we can mathematically describe the production process.
  • So far, we have obtained a model, which given the inputs and a particular technology, estimates the total cumulative gain. However, given that we want to determine what strategy minimizes the users cost – then we need to formulate a cost function to represent the cost of interaction.Assuming that assessing one document is equal to ONE.
  • Now that we have a way to frame the interaction between a user and a system when searching, we can now hypothesise about the users behavior if variables or parameters in the model change. For example…
  • Insert graph here from ECON-IIR paper.
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • Transcript of "Azzopardi2012economics of iir_tech_talk"

    1. 1. The Economics inInteractive Information Retrieval Leif Azzopardi http://www.dcs.gla.ac.uk/~leif
    2. 2. InteractionCost Benefit
    3. 3. Interactive and Iterative Search A simplified, abstracted, representationInformation Need Documents Returned User System Queries RelevantInformation
    4. 4. Observational & Empirical Berry Picking IS&RASK Framework Information Foraging Theoretical & Formal Theory Pirolli (1999)
    5. 5. Theoretical & Formal A Major Research ChallengeInteractive Information Retrieval needs formal models to: • describe, explain and predict the interaction of users with systems, • provide a basis on which to reason about interaction, • understand the relationships between interaction, performance and cost, • help guide the design, development and research of information systems, and Belkin (2008) • derive laws and principles of interaction. Jarvelin (2011)
    6. 6. User queries tend to beshort (only 2-3 terms) Web searchers typically only examine the firstUsers will often pose a page of resultsseries of short queries WhyHowusers behave like this? do do users behave?Patent searchers typically Users adapt to degradedexamine 100-200 documents per systems by issuing morequery (using a Boolean system) queriesPatent searchers usually express Users rarely providelonger and complex queries explicit relevance feedback
    7. 7. So why do users pose short queries?User queries tend to be short But longer queries tend to be more effective!
    8. 8. So why do users pose short queries? 0.5 Exponentially diminishing 0.45 returns kicks in after 2 0.4 query terms 0.35Performance 0.3 Total Performance 0.25 0.2 0.15 Around 2-3 terms is where 0.1 Marginal Performance the user gets the most bang 0.05 for their buck 0 0 10 20 30 Query Length (No. of Terms) Azzopardi (2009)
    9. 9. How can we use microeconomics to model the search process?
    10. 10. MicroeconomicsProduction Theory Consumer TheoryUtility Maximization Cost Minimization
    11. 11. Production Theory a.k.a. Theory of FirmsInputs Output The Firm Capital Widgets Labor Utilizes Constrains Varian (1987) Technology
    12. 12. Production FunctionsProduction Function Capital Labor
    13. 13. Production FunctionsProduction Function Quantity = F ( Capital, Labor ) Capital Quantity 3 Quantity 2 Quantity 1 Labor
    14. 14. Production Functions Production SetProduction Function Capital Quantity 3 Quantity 2 Quantity 1 Labor
    15. 15. Production Functions Production SetProduction Function Capital Quantity 3 Quantity 2 Technology constrains the production set Quantity 1 Labor
    16. 16. Applying Production Theory toInteractive Information Retrieval
    17. 17. Interactive and Iterative Search A simplified, abstracted, representationInformation Need Documents Returned User System Queries RelevantInformation
    18. 18. Search as ProductionInputs Output The Firm Queries Relevance Assessments Gain Utilizes Constrains Search Engine Technology
    19. 19. Search Production Function The function represents how well a system could be used.No. of Queries (Q) i.e. the min input required to achieve that level of gain Gain = 30 Gain = F(Q,A) Gain = 20 Gain = 10 No. of Assessments per Query (A)
    20. 20. What strategies can the user employ when interacting with the search system to achieve their end goal Lots of Few Queries, Queries, Lots of Few Assessments? Assessments ? Or some other way? What is the most cost-efficient way for a user to interact with an IR system?
    21. 21. Modeling Caveats of an economic model of the search processAbstracted SimplifiedRepresentative
    22. 22. What does the modeltell us about search & interaction?
    23. 23. Search Scenario Scenario• Task: Find news articles about ….• Goal: To find a number of relevant documents and reach the desired level of Cumulative Gain.• Output: Total Cumulative Gain (G) across the session• Inputs: – Y No. of Queries, and – X No. of Assessments per Query• Collections: – TREC News Collections (AP, LA, Aquaint) – Each topic had about 30 or more relevant documents• Simulation: built using C++ and the Lemur IR toolkit
    24. 24. Simulating User Interaction Models: Probabilistic TREC Vector Space Aquaint Boolean Topics Assesses Record X & Y X Documents for each Simulated User per QueryThe simulation assumes the user level of gain has perfect information –in order to find out how well the system could be used. Select the best query first/next Queries generated Issues Y QueriesTREC Documents from Relevant set of Length 3marked Relevant
    25. 25. Search Production Curves Same Retrieval Model, Different Gain TREC Aquaint Collection 20To double the gain, requires 18 BM25 NCG=0.2more than double the no. of 16assessments BM25 NCG=0.4 14 No. of Queries 12 8 Q & 15 Q/A gets NCG = 0.4 10 4 Q & 40 Q/A gets NCG = 0.4 8 7.7 Q & 5 Q/A gets NCG = 0.2 6 3.6 Q & 15 Q/A gets NCG = 0.2 4 2 0 0 50 100 150 200 250 300 No. of Assessments per Query
    26. 26. Search Production Curves Different Retrieval Models, Same Gain TREC Aquaint Collection 20 No input combinations BM25 NCG=0.4 18 with depth less than this BOOL NCG=0.4 16 are technically feasible! TFIDF NCG=0.4 14 No. of Queries For the same gain, BOOL 12 and TFIDF require a lot 10 more interaction. 8 6 BM25 provides more 4 strategies (i.e. inputUser Adaption: combinations) than 2 BOOL or TFIDF-BM25: 5 Q @ 25 A/Q 0-BOOL: 10 Q @ 25A/QMore queries on the 50 0 100 150 200 250 300degraded systems No. of Assessments per Query
    27. 27. Search Production Function Cobbs-Douglas Production Function No. of Assessments per query Mixing parameter determined by the technology a (1-a ) f (Q, A) = K.Q .ANo. of queries issued Efficiency of the technology used Model K α Goodness of Fit BM25 5.39 0.58 0.995 BOOL 3.47 0.58 0.992 TFIDF 1.69 0.50 0.997 Example Values on Aquaint when NCG = 0.6
    28. 28. Using the Cobbs-Douglas Search Function We can differentiate the function to find the rates of change of the input variables ¶f (Q, A) Marginal Product of Querying ¶Q – the change in gain over the change in querying – i.e. how much more gain do we get if we pose extra queries ¶f (Q, A) Marginal Product of Assessing ¶A – the change in gain over the change in assessing – i.e. how much more gain do we get if we assess extra documents
    29. 29. Technical Rate of SubstitutionHow many more assessments per query are needed, if one less query was posed? TRS of Assessments for Queries ¶A 20 TRS(A,Q) = 18 0.4 BM25 NCG=0.4 At this point if you gave up ¶Q 16 one query you’d need toNo. of Queries 14 1.2 assess 1.2 extra docs/query 12 2.5 EXAMPLE: 10 If 5 queries are 8 4.2 submitted, instead of 6, then 6 24.2 docs/query need to be 8.3 4 assessed, instead of 20 2 docs/query 0 6Q @ 20A / Q = 120 A 0 100 200 5Q300 24.2 / Q = 121 A @ No. of Assessments per Query
    30. 30. What about the cost of interaction?
    31. 31. User Search Cost Function A linear cost function No. of Assessments per query Total no. of documents assessed c(Q, A) = b.Q +Q.ANo. of queries issued Relative cost of a Query to an Assessment What is the relative cost of a query? Using cognitive costs of querying and assessing taken from Gwizdka (2010): • The average cost of querying was 2628 ms • The average cost of assessing was 2226 ms • So β was set to 2628/2226 = 1.1598
    32. 32. Cost Efficient Strategies BM25 0.4 and 0.6 Gains 50No. of Queries 40 On BM25 to increase 30 gain pose more 20 queries, but examine 10 BM25@0.6 the same no. of docs 0 BM25@0.4 per query 0 10 20 30 380 330Cost 280 230 Minimum Cost 180 130 0 10 20 30 No. of Assessment per Query
    33. 33. Cost Efficient Strategies BOOL 0.4 & 0.6 Gains On Boolean, to 12 No. of Queries increase gain, 10 8 issue the about the 6 BOOL@0.6same no. of queries, 4 but examine more 2 BOOL@0.4 docs per query 0 0 100 200 1500 1300 1100 Cost 900 700 Minimum Cost 500 300 0 100 200 No. of Assessment per Query
    34. 34. Contrasting Systems BM25 0.4 and 0.6 Gains BOOL 0.4 and 0.6 Gains 50 12 40 10 No. of QueriesNo. of Queries 8 30 On BM25 issue more queries 6 BOOL@0.6 20 4 10 2 BOOL@0.4 0 But examine less doc per query 0 0 10 20 30 0 100 200 380 1500 330 1300 1100Cost Cost 280 900 230 700 180 BM25 is less costly to use than 500 BOOL 130 300 0 10 20 30 0 100 200 No. of Assessment per Query No. of Assessment per Query
    35. 35. A Hypothetical Experiment What happens if Querying More Decrease in costs queries assessments go down? issued per query$$$$ Querying Decrease in Increase in costs queries assessments go up? issued per query
    36. 36. Changing the Relative Query Cost c(Q, A) = b.Q +Q.A As β increases the relative cost ofCost querying goes up, it is cheaper to assess more documents per query and consequently query less! No. of Assessment per Query
    37. 37. Implications for Design• Knowing how benefit, interaction and cost relate can help guide how we design systems – We can theorize about how changes to the system will affect the user’s interaction • Is this desirable? Do we want the user to query more? Or for them to assess more? – We can categorize the type of user • Is this a savvy rational user? Or is this a user behaving irrationally? – We can scrutinize the introduce of new features • Are they going to be of any use? Are they worth it for the user? i.e. how much more performance, or how little must they cost?
    38. 38. Future Directions Future Directions• Validate the theory by conducting observational & empirical research – Do the predictions about user behavior hold?• Incorporate other inputs into the model – Find Similar, Relevance Feedback, Browsing, – Query length, Query Type, etc• Develop more accurate cost functions – Obtain Better Estimates of Costs• Model other search tasks
    39. 39. Questions Contact DetailsEmail: Leifos@acm.orgSkype: LeifosTwitter: @leifos
    40. 40. Selected References• Varian, H., Intermediate Microeconomics, 1987• Varian, H., Economics and Search, ACM SIGIR Forum, 1999• Pirolli, P., Information Foraging Theory, 1999• Belkin, N., Some (what) grand challenges of Interactive Information Retrieval, ACM SIGIR Forum, 2008• Azzopardi, L., Query Side Evaluation, ACM SIGIR 2009 – http://dl.acm.org/citation.cfm?doid=1571941.1572037• Azzopardi, L., The Economics of Interactive Information Retrieval, ACM SIGIR 2011 – http://dl.acm.org/citation.cfm?doid=2009916.2009923• Jarvelin, K., IR Research: Systems, Interaction, Evaluation and Theories, ACM SIGIR Forum, 2011
    41. 41. Search Production Function Example G = F( X, Y )Interaction X Interaction Y
    42. 42. Search Production Function Example application for web search P@10 = F(L,A)Length of Query (L) P@10= 0.3 P@10= 0.2 P@10= 0.1 No. of Assessments (A)
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×