Report

OpenSource ConnectionsFollow

May. 17, 2019•0 likes•304 views

May. 17, 2019•0 likes•304 views

Download to read offline

Report

Data & Analytics

Relevance metrics like NDGC or ERR require graded judgements to evaluate query relevance performance. But what happens when we don't know what 'good' looks like ahead of time? This talk will look at using click modeling techniques to infer relevance judgements from user interaction logs.

OpenSource ConnectionsFollow

Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...OpenSource Connections

Recommender Systems! @ASAI 2011Ernesto Mislej

Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain

Cannonical correlationdomsr

Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Sease

Null hypothesis for a Factorial ANOVAKen Plummer

- 1. Solving for Satisfaction: Introduction to Click Models Elizabeth Haubert OpenSource Connections April, 2019
- 2. Outline ● Measuring Relevance ● What is an Event Model? ● How do we build one? ● How do we use one?
- 3. What is Relevance? ● How are our queries performing? ○ Which queries are well? Which ones aren’t? ● Which content do users prefer? ● Is there better content which users will like more?
- 4. Relevance Metrics ● DCG, nDCG: Discounted Cumulative Gain ● RR: Reciprocal Rank ● Precision? Recall? Judgements!
- 5. Judgements Example, as CSV query, id, grade "red rose",1234,Awesome "red rose",1424, Terrible "fertilizer",1234,Terrible "fertilizer",521,Awesome! "fertilizer",5215,Neutral "daisy",521,Pretty Bad "daisy",215,Pretty Bad
- 6. Judgements Example, as CSV query, id, grade "red rose",1234,4 "red rose",1424,0 "fertilizer",1234,0 "fertilizer",521,4 "fertilizer",5215,3 "daisy",521,2 "daisy",215,2
- 7. Judgements Example, as CSV query, id, grade "red rose",1234,4 "red rose",1424,0 "fertilizer",1234,0 "fertilizer",521,4 "fertilizer",5215,3 "daisy",521,2 "daisy",215,2 This is a model! F(Q, D) = J
- 8. Probabilities Example, as CSV query, id, P(meets info need) "red rose",1234,0.9 "red rose",1424,0.1 "fertilizer",1234,0.1 "fertilizer",521,0.9 "fertilizer",5215,0.65 "daisy",521,0.4 "daisy",215,0.4
- 9. Expected Reciprocal Rank But when does a user stop at rank r?
- 10. Outline ● Measuring Relevance ● What is an Event Model? ● How do we build one? ● How do we use one?
- 11. https://commons.wikimedia.org/wiki/File:American_quarter_horse.jpg
- 12. https://allpainters.org/paintings/study-of-horses-edgar-degas.html#gallery
- 13. https://www.google.com/search?q=child%27s+sketch+unicorn&rlz=1C5CHF A_enUS729US732&source=lnms&tbm=isch&sa=X&ved=0ahUKEwithsmI-- XhAhVPSN8KHbsyB4kQ_AUIDigB&biw=1440&bih=623#imgrc=klRNFsu87Y
- 14. Event Models Definition: A mathematical expression to define the relationship between SERP features and a user event ● F(Q,D) = P(Event = 1) ○ F(Q,D) = ρ ○ F(Q,D) = ρr ○ F(Q,D) = ΣP(Eventr = 1 | other_event = 1)
- 15. Event Models 1. The events / variables considered 2. A dependency graph describing the relationships 3. The conditional probabilities along the edges in the graph 4. The correspondence between the model’s parameters, SERP and query features Chucklin, Markov, de Rijke (2015) Click Models for Web Search. Morgan & Claypool Publishers,
- 16. Events
- 17. Events: Impressions 1 2 3 3 Impressions }
- 18. Events: Clicks 1 2 3 1 Click
- 19. Dependency Graph Impression Attracted Click
- 20. Probabilities ● P(E= 1) = P(Impression) = ε ○ What is the probability a user saw this doc? ● P(A = 1) = P(Attracted) = α ○ What is the probability a user is interested in this document ● P(Click = 1) = P(Impression) ∩ P(Attracted)
- 21. Solving for Attractiveness ● P(Click = 1) = P(Impression) ∩ P(Attracted) ● P(Click = 1) = P(Impression) * P(Attracted) ● P(Click = 1) = P(Impression) * P(Attracted ● ● P(Attracted) = P(Click = 1) / P(Impression)
- 22. Attractiveness Position in SERP Number of clicks Number of Impressions P(E=1) P(C=1) Attractiveness score for this (query, doc) 1 20 200 200/200 20/200 20 / 200 = 0.1 4 20 100 100/200 20/100 20 / 100 = 0.2 12 20 50 50/200 20/200 20 / 50 = 0.4
- 23. Click Through Rate (CTR) Click-Through Rate is the ratio of users who click on a specific link to the number of total users who view a page, email, or advertisement (2019, Jan 28) Click-Through Rate. https://en.wikipedia.org/wiki/Click-through_rate
- 24. More Events: Attractiveness vs Satisfaction Attractiveness User is sufficiently interested to click the document Satisfaction Document meets user’s information need
- 25. Probabilities with Satisfaction ● P(E= 1) = P(Impression) = ε ○ What is the probability a user saw this doc? ● P(A = 1) = P(Attracted) = α ○ What is the probability a user is interested in this document ● P(Click = 1) = αε ● P(S = 1) = P(Satisfaction) = σ
- 26. Dependency Graph with Observed Conversions Impression Attracted Click Satisfied
- 27. Conversion Rate Conversion Rate is the ratio of users who complete a specific task to the number of total users who view a page, email, or advertisement (2019) Conversion Rate. https://www.optimizely.com/optimization-glossary/conversion-rate/
- 28. Dynamic Baysian Network Model (DBN) Impr Att Click Impr Att Click Sat S
- 29. ● Cr = 1 ⇔ Er = 1 And Ar = 1 ● P(Ar = 1) =αrq ● P(E1 = 1) = 1 ● P(Er = 1 | E r-1 = 0) = 0 ● P(Sr = 1 | Cr = 1) = σrq ● P(Er = 1 | Sr-1 = 1) = 0 ● P(Er = 1 | Er-1 =1, Sr-1 = 0) = γ DBN Equations r = rank u = user q = query
- 30. DBN Click Probabilities P(Cqr = 1) = αqεr And εr+1 = εrγ(αrq((1-σrq) + (1-αrq)) ● So 4 parameters, α,ε, σ, γ are parameters to the model ● α,ε can be observed (or approximated with observations) ● γ must be estimated
- 31. Solving for DBN Satisfaction
- 32. http://www.howtodrawanimals.net/ how-to-draw-a-horse
- 33. Solving For Parameters ● MLE: Maximum Likelihood Estimator ○ Simple models, parameters are something we can solve for ● EM Estimation ○ DBN, not so much. ○ Probability of Satisfaction depends on precursors ○ Iterative, model-fitting approach
- 34. EM Intuition ● Try to guess the parameters ● Try to measure the likelihood that the samples you have came from a distribution with those parameters ● Rinse, Lather, Repeat https://www.python-course.eu/expectation_maximization_and_gaussian_mixture_models.php
- 35. Evaluating Models https://brothersenroute.com/2015/04/23/cadence-thoughts-and-spherical-horses-1-of-3/
- 36. Evaluating Models ● How well did the model fit the observed queries? ○ Mean Squared Error ○ Log-Likelihood ● How well does the model fit repeated sample sets? ○ Test-Retest http://www.socialresearchmethods.net/kb/reltypes.php
- 37. Outline ● Measuring Relevance ● What is an Event Model? ● How do we build one? ● How do we use one?
- 38. 1. Generate Document Probabilities Example, as CSV query, id, P(Attracted), P(Satisfied) "red rose",1234,0.9, .75 "red rose",1424,0.1, .0001 "Fertilizer",1234,0.1, .0001 "Fertilizer",521,0.9, .8 "Fertilizer",5215,0.65, .65 "Daisy",521,0.4, .2 "Daisy",215,0.4, .1
- 39. Relevance Metrics Revisited ● MAP@n: Mean Average Precision ○ Usually used with binary (0/1) gradations of relevance ○ 0,1, 1,0, 0, 1 = 3/6 = .6 ● MAP@n with Probabilities: ○ .9,.9,.4,.1,.9 = 0.64
- 40. 2. Regenerating Models ● Individual documents don’t have a fixed relevance ○ Freshness ○ Seasonality ● This approach solves for model parameters ○ P(click @ r) ● Have model parameters have changed over time? ○ Test/Retest correlation ○ Hypothesis testing ● Can we set an alert on that? ○ TBD By HTO - Self-photographed, Public Domai https://commons.wikimedia.org/w/index.p urid=6909643
- 41. 3. Which documents are performing well? When is observed rate of event > calculated P(Event)? ● This is something we can do a hypothesis test on ● This is something we can predict required sample size
- 42. Summary ● Categorical Judgements are good for human raters ● Probabilistic measurements are better for computer raters ● These probabilities are consistent with standard relevance measures ● These probabilities are also consistent with standard e-commerce measures ● If we have click logs, we can solve for simple models directly ● There are tools to estimate parameters in more complex models
- 43. Bibliography ● Chucklin, Markov, de Rijke (2015) Click Models For Web Search ○ Tooling: https://github.com/varepsilon/clickmodels ● Katsov (2018) An Introduction To Algorithmic Marketing ● Krzanowski (1998) An Introduction To Statistical Modelling ● Schuth (2016) Search Engines That Learn From Their Users. EM References: ● https://www.python-course.eu/expectation_maximization_and_gaussian_mixture_models.php ● https://ibug.doc.ic.ac.uk/media/uploads/documents/expectation_maximization-1.pdf

- Put a picture and quick definitions here. Stick to the ad-world definitions.
- Put a picture and quick definitions here. Stick to the ad-world definitions.
- Put a picture and quick definitions here. Stick to the ad-world definitions.