• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Azzopardi2012economics of iir_tech_talk
 

Azzopardi2012economics of iir_tech_talk

on

  • 334 views

In this talk, I discuss how Micro-economics can be used to describe, explain and prediction the interactions of a user and information retrieval system. The work is based on the ACM SIGIR 2011 paper ( ...

In this talk, I discuss how Micro-economics can be used to describe, explain and prediction the interactions of a user and information retrieval system. The work is based on the ACM SIGIR 2011 paper ( http://dl.acm.org/citation.cfm?id=2009923 ) and is available to download from: http://www.dcs.gla.ac.uk/~leif/papers/azzopardi2011economics.pdf

Statistics

Views

Total Views
334
Views on SlideShare
332
Embed Views
2

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 2

http://us-w1.rockmelt.com 1
https://si0.twimg.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • In this talk I will discuss how we can use micro-economics to describe how users interact with a retrieval system – essentially, I will model how the benefit/gain/performance a user obtains from a system, the interactions which they perform, and the cost of these interaction -
  • Note: related to this work is the work by Piriolli, Card and Ed Chi on Information Foraging Theory.
  • Belkin (2008) outlined some of the challenges within IIRJarvelin (2011) also argued the need to understand Info. Sys. Through the development of formal models and testable theories to describe the interaction b/w users and systems.It is a major research challenge because of all the complexities involved with users, their interactions with information and the systems that they employ.
  • So this provides an economic justification for posing short queries..
  • Microeconomics might give us the right tools to models IIR.we have build a formal model based on production theory from economics: which explains, predicts..etc. .An area that looks how to
  • A firm produces output (such as goods or services) A firm requires inputs (such as capital and labor)A firm utilizes some form of technology to then transform the inputs into outputs.
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • Inputs: the number of queries, the length of queries, the number of documents assessed per query, etc.Output: a number of relevant documents (or gain from the relevant information found).Technology used: a Search engine
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • And if we map a cost function to the interactions then we can ask, “what is the most cost-efficient way for a user to interact with an IR system?”What strategy should a user employ to achieve their goal?
  • What strategy should a user eYes, the model is abstracted and general. ….. Search has many more inputs and outputs. Lots more variables, but we have abstracted away these details. We have simplified the search process to two core variables that affect the output.But this doesn’t mean the model doesn’t have any explanatory powerRepresentative, but not necessarily wholly realisticemploy to achieve their goal?
  • Now that we have framed the search process as an economics problem, and we have an economic model that describes the output given the inputs and the technology, the big question is: WHAT CAN WE DO WITH IT?So to explore the application of this theory to IIR we perform an economic analysis of search
  • Airbus Subsidies byEuropean governmentsCases of Insider tradingTropical storms where people were killedSImulated interaction: i.e. to determine the minimum inputs for the desired output – and thus obtain the production function.
  • SImulated interaction: i.e. to determine the minimum inputs for the desired output – and thus obtain the production function.To explore the range of possible user strategies i.e. examine all the combinations of inputs .- Queries of length 3 were generated for each topic given the relevance documents. i.e. create high quality queries.Simulated Interaction: - A session was comprised of a series of queries, and a given assessment depth. The session ended when the desired gain was achieved.- Best-First approach to obtain an approximation for an empirical production function.
  • The blue and purple lines converge, because I stopped the simulation when ncg > 0.2, and not (ncg > 0.2 and 0.2 and >0.4.I really should have fixed the simulation, and stopped when the gain was greater than 0.25 so the the production curve for 0.2 gain would stop at about A =75.
  • So far we have only examined empirical estimates of the production curves/functions. It would be good if we could fit a mathematical function to these curves to have well defined model.
  • So far we have empirically estimated the production function. However, it is common in economics to fit a functional form to the production function – so that we can mathematically describe the production process.
  • So far, we have obtained a model, which given the inputs and a particular technology, estimates the total cumulative gain. However, given that we want to determine what strategy minimizes the users cost – then we need to formulate a cost function to represent the cost of interaction.Assuming that assessing one document is equal to ONE.
  • Now that we have a way to frame the interaction between a user and a system when searching, we can now hypothesise about the users behavior if variables or parameters in the model change. For example…
  • Insert graph here from ECON-IIR paper.
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs
  • Technological Constraints – the constraints imposed by the technology, i.e. its efficiency or ability to produce the outputProduction Set – the set of all possible combinations of inputs that yield the desired outputProduction Function – a set of points where the desired output is obtained for the minimum combination of inputs

Azzopardi2012economics of iir_tech_talk Azzopardi2012economics of iir_tech_talk Presentation Transcript

  • The Economics inInteractive Information Retrieval Leif Azzopardi http://www.dcs.gla.ac.uk/~leif
  • InteractionCost Benefit
  • Interactive and Iterative Search A simplified, abstracted, representationInformation Need Documents Returned User System Queries RelevantInformation
  • Observational & Empirical Berry Picking IS&RASK Framework Information Foraging Theoretical & Formal Theory Pirolli (1999)
  • Theoretical & Formal A Major Research ChallengeInteractive Information Retrieval needs formal models to: • describe, explain and predict the interaction of users with systems, • provide a basis on which to reason about interaction, • understand the relationships between interaction, performance and cost, • help guide the design, development and research of information systems, and Belkin (2008) • derive laws and principles of interaction. Jarvelin (2011)
  • User queries tend to beshort (only 2-3 terms) Web searchers typically only examine the firstUsers will often pose a page of resultsseries of short queries WhyHowusers behave like this? do do users behave?Patent searchers typically Users adapt to degradedexamine 100-200 documents per systems by issuing morequery (using a Boolean system) queriesPatent searchers usually express Users rarely providelonger and complex queries explicit relevance feedback
  • So why do users pose short queries?User queries tend to be short But longer queries tend to be more effective!
  • So why do users pose short queries? 0.5 Exponentially diminishing 0.45 returns kicks in after 2 0.4 query terms 0.35Performance 0.3 Total Performance 0.25 0.2 0.15 Around 2-3 terms is where 0.1 Marginal Performance the user gets the most bang 0.05 for their buck 0 0 10 20 30 Query Length (No. of Terms) Azzopardi (2009)
  • How can we use microeconomics to model the search process?
  • MicroeconomicsProduction Theory Consumer TheoryUtility Maximization Cost Minimization
  • Production Theory a.k.a. Theory of FirmsInputs Output The Firm Capital Widgets Labor Utilizes Constrains Varian (1987) Technology
  • Production FunctionsProduction Function Capital Labor
  • Production FunctionsProduction Function Quantity = F ( Capital, Labor ) Capital Quantity 3 Quantity 2 Quantity 1 Labor
  • Production Functions Production SetProduction Function Capital Quantity 3 Quantity 2 Quantity 1 Labor
  • Production Functions Production SetProduction Function Capital Quantity 3 Quantity 2 Technology constrains the production set Quantity 1 Labor
  • Applying Production Theory toInteractive Information Retrieval
  • Interactive and Iterative Search A simplified, abstracted, representationInformation Need Documents Returned User System Queries RelevantInformation
  • Search as ProductionInputs Output The Firm Queries Relevance Assessments Gain Utilizes Constrains Search Engine Technology
  • Search Production Function The function represents how well a system could be used.No. of Queries (Q) i.e. the min input required to achieve that level of gain Gain = 30 Gain = F(Q,A) Gain = 20 Gain = 10 No. of Assessments per Query (A)
  • What strategies can the user employ when interacting with the search system to achieve their end goal Lots of Few Queries, Queries, Lots of Few Assessments? Assessments ? Or some other way? What is the most cost-efficient way for a user to interact with an IR system?
  • Modeling Caveats of an economic model of the search processAbstracted SimplifiedRepresentative
  • What does the modeltell us about search & interaction?
  • Search Scenario Scenario• Task: Find news articles about ….• Goal: To find a number of relevant documents and reach the desired level of Cumulative Gain.• Output: Total Cumulative Gain (G) across the session• Inputs: – Y No. of Queries, and – X No. of Assessments per Query• Collections: – TREC News Collections (AP, LA, Aquaint) – Each topic had about 30 or more relevant documents• Simulation: built using C++ and the Lemur IR toolkit
  • Simulating User Interaction Models: Probabilistic TREC Vector Space Aquaint Boolean Topics Assesses Record X & Y X Documents for each Simulated User per QueryThe simulation assumes the user level of gain has perfect information –in order to find out how well the system could be used. Select the best query first/next Queries generated Issues Y QueriesTREC Documents from Relevant set of Length 3marked Relevant
  • Search Production Curves Same Retrieval Model, Different Gain TREC Aquaint Collection 20To double the gain, requires 18 BM25 NCG=0.2more than double the no. of 16assessments BM25 NCG=0.4 14 No. of Queries 12 8 Q & 15 Q/A gets NCG = 0.4 10 4 Q & 40 Q/A gets NCG = 0.4 8 7.7 Q & 5 Q/A gets NCG = 0.2 6 3.6 Q & 15 Q/A gets NCG = 0.2 4 2 0 0 50 100 150 200 250 300 No. of Assessments per Query
  • Search Production Curves Different Retrieval Models, Same Gain TREC Aquaint Collection 20 No input combinations BM25 NCG=0.4 18 with depth less than this BOOL NCG=0.4 16 are technically feasible! TFIDF NCG=0.4 14 No. of Queries For the same gain, BOOL 12 and TFIDF require a lot 10 more interaction. 8 6 BM25 provides more 4 strategies (i.e. inputUser Adaption: combinations) than 2 BOOL or TFIDF-BM25: 5 Q @ 25 A/Q 0-BOOL: 10 Q @ 25A/QMore queries on the 50 0 100 150 200 250 300degraded systems No. of Assessments per Query
  • Search Production Function Cobbs-Douglas Production Function No. of Assessments per query Mixing parameter determined by the technology a (1-a ) f (Q, A) = K.Q .ANo. of queries issued Efficiency of the technology used Model K α Goodness of Fit BM25 5.39 0.58 0.995 BOOL 3.47 0.58 0.992 TFIDF 1.69 0.50 0.997 Example Values on Aquaint when NCG = 0.6
  • Using the Cobbs-Douglas Search Function We can differentiate the function to find the rates of change of the input variables ¶f (Q, A) Marginal Product of Querying ¶Q – the change in gain over the change in querying – i.e. how much more gain do we get if we pose extra queries ¶f (Q, A) Marginal Product of Assessing ¶A – the change in gain over the change in assessing – i.e. how much more gain do we get if we assess extra documents
  • Technical Rate of SubstitutionHow many more assessments per query are needed, if one less query was posed? TRS of Assessments for Queries ¶A 20 TRS(A,Q) = 18 0.4 BM25 NCG=0.4 At this point if you gave up ¶Q 16 one query you’d need toNo. of Queries 14 1.2 assess 1.2 extra docs/query 12 2.5 EXAMPLE: 10 If 5 queries are 8 4.2 submitted, instead of 6, then 6 24.2 docs/query need to be 8.3 4 assessed, instead of 20 2 docs/query 0 6Q @ 20A / Q = 120 A 0 100 200 5Q300 24.2 / Q = 121 A @ No. of Assessments per Query
  • What about the cost of interaction?
  • User Search Cost Function A linear cost function No. of Assessments per query Total no. of documents assessed c(Q, A) = b.Q +Q.ANo. of queries issued Relative cost of a Query to an Assessment What is the relative cost of a query? Using cognitive costs of querying and assessing taken from Gwizdka (2010): • The average cost of querying was 2628 ms • The average cost of assessing was 2226 ms • So β was set to 2628/2226 = 1.1598
  • Cost Efficient Strategies BM25 0.4 and 0.6 Gains 50No. of Queries 40 On BM25 to increase 30 gain pose more 20 queries, but examine 10 BM25@0.6 the same no. of docs 0 BM25@0.4 per query 0 10 20 30 380 330Cost 280 230 Minimum Cost 180 130 0 10 20 30 No. of Assessment per Query
  • Cost Efficient Strategies BOOL 0.4 & 0.6 Gains On Boolean, to 12 No. of Queries increase gain, 10 8 issue the about the 6 BOOL@0.6same no. of queries, 4 but examine more 2 BOOL@0.4 docs per query 0 0 100 200 1500 1300 1100 Cost 900 700 Minimum Cost 500 300 0 100 200 No. of Assessment per Query
  • Contrasting Systems BM25 0.4 and 0.6 Gains BOOL 0.4 and 0.6 Gains 50 12 40 10 No. of QueriesNo. of Queries 8 30 On BM25 issue more queries 6 BOOL@0.6 20 4 10 2 BOOL@0.4 0 But examine less doc per query 0 0 10 20 30 0 100 200 380 1500 330 1300 1100Cost Cost 280 900 230 700 180 BM25 is less costly to use than 500 BOOL 130 300 0 10 20 30 0 100 200 No. of Assessment per Query No. of Assessment per Query
  • A Hypothetical Experiment What happens if Querying More Decrease in costs queries assessments go down? issued per query$$$$ Querying Decrease in Increase in costs queries assessments go up? issued per query
  • Changing the Relative Query Cost c(Q, A) = b.Q +Q.A As β increases the relative cost ofCost querying goes up, it is cheaper to assess more documents per query and consequently query less! No. of Assessment per Query
  • Implications for Design• Knowing how benefit, interaction and cost relate can help guide how we design systems – We can theorize about how changes to the system will affect the user’s interaction • Is this desirable? Do we want the user to query more? Or for them to assess more? – We can categorize the type of user • Is this a savvy rational user? Or is this a user behaving irrationally? – We can scrutinize the introduce of new features • Are they going to be of any use? Are they worth it for the user? i.e. how much more performance, or how little must they cost?
  • Future Directions Future Directions• Validate the theory by conducting observational & empirical research – Do the predictions about user behavior hold?• Incorporate other inputs into the model – Find Similar, Relevance Feedback, Browsing, – Query length, Query Type, etc• Develop more accurate cost functions – Obtain Better Estimates of Costs• Model other search tasks
  • Questions Contact DetailsEmail: Leifos@acm.orgSkype: LeifosTwitter: @leifos
  • Selected References• Varian, H., Intermediate Microeconomics, 1987• Varian, H., Economics and Search, ACM SIGIR Forum, 1999• Pirolli, P., Information Foraging Theory, 1999• Belkin, N., Some (what) grand challenges of Interactive Information Retrieval, ACM SIGIR Forum, 2008• Azzopardi, L., Query Side Evaluation, ACM SIGIR 2009 – http://dl.acm.org/citation.cfm?doid=1571941.1572037• Azzopardi, L., The Economics of Interactive Information Retrieval, ACM SIGIR 2011 – http://dl.acm.org/citation.cfm?doid=2009916.2009923• Jarvelin, K., IR Research: Systems, Interaction, Evaluation and Theories, ACM SIGIR Forum, 2011
  • Search Production Function Example G = F( X, Y )Interaction X Interaction Y
  • Search Production Function Example application for web search P@10 = F(L,A)Length of Query (L) P@10= 0.3 P@10= 0.2 P@10= 0.1 No. of Assessments (A)