Your SlideShare is downloading.
×

×
Saving this for later?
Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.

Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

- Is MongoDB a good choice for your A... by Oodles Technologi... 125 views
- Lessons learned from SearchD develo... by tkramar 16 views
- MongoDB: Repository for Web-scale m... by tkramar 585 views
- Introduction to Machine Learning by butest 339 views
- Migrating existing NoSQL Data to Ap... by ShepHertz 1718 views
- Rating Evaluation Methods through C... by Welocalize 306 views
- People who liked this talk also lik... by Ryan Weald 1209 views
- Rank correlation- some features and... by Kushal Kumar Dey 1301 views
- Pa 298 measures of correlation by Maria Theresa 1136 views
- Intelligent Ruby + Machine Learning by Ilya Grigorik 15072 views
- Learn to Rank search results by Ganesh Venkataraman 492 views
- Practical Machine Learning and Rail... by ryanstout 5099 views

Like this? Share it with your network
Share

405

views

views

Published on

No Downloads

Total Views

405

On Slideshare

0

From Embeds

0

Number of Embeds

6

Shares

0

Downloads

5

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Learning to rank fulltext results from clicks Tomáš Kramár @tkramar @synopsitv
- 2. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43
- 3. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43 ● ElasticSearch ● LIKE %% ● ...
- 4. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43 ● By number of hits ● By PageRank ● By Date ● ...
- 5. How do you choose relevant results?
- 6. Number of keywords in title 2 2 Number of keywords in text 2 0 Domain carrerjet.sk vienna-rb.at Category Job search Programming Language Slovak English
- 7. Document feature How much I care about it (the higher the more I care) # keywords in title 2.1 # keywords in text 1 Domain is carreerjet.sk -2 Domain is vienna-rb.at 3.5 Category is Job Search -1 Category is Programming 4.2 Language is Slovak 0.9 Language is English 1.5
- 8. Document feature How much I care about it # keywords in title 2.1 2 2 # keywords in text 1 2 0 Domain is carreerjet.sk -2 1 0 Domain is vienna-rb.at 3.5 0 1 Category is Job Search -1 1 0 Category is Programming 4.2 0 1 Language is Slovak 0.9 1 0 Language is English 1.5 0 1 = 4.1 = 13.3rank = d . u
- 9. Rate each result on a scale 1- 5.
- 10. rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3 d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5 d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1 d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3
- 11. rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3 d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5 d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1 d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3 di,j are known, solve this system of equations and you have u. Done.
- 12. Except.. ● You don't know the explicit ratings ● User preferences change in time ● Those equations probably don't have solution
- 13. Clicked! Assume rating 1. Not clicked. Assume rating 0.
- 14. Except.. ● You don't know the explicit ratings ● User preferences change in time ● Those equations probably don't have solution
- 15. Approximation function h(d): d → rank h(d) = d1 .u1 + ... + dn .un = estimated_rank If the function is good, it should make minimal errors error = (estimated_rank - real_rank)2
- 16. Gradient descent 1. Set user preferences (u) to arbitrary values 2. Calculate the estimated rank h(d) for each document 3. Calculate the mean square error 4. Adjust preferences u in a way that minimizes the error 5. Repeat until the error converges
- 17. meansquareerror u# of keywords in title cost function
- 18. meansquareerror u# of keywords in title cost function Calculate the derivation of cost function at this point and it will give you the direction to move in.
- 19. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui
- 20. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui How fast will you move. Too low - slow progress. Too high - you will overshoot.
- 21. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui Nothing scary. You can find these online for standard cost functions. For mean square error: (rank(d) - h(d)) * ui
- 22. Gradient descent 1. Set user preferences (u) to arbitrary values 2. Calculate the estimated rank h(d) for each document 3. Calculate the square error 4. Adjust preferences u in a way that minimizes the error 5. Repeat until the error converges
- 23. Clicked! Assume rating 1. Clicked! Assume rating 1. Or? Doesn't this mean result #1 is not relevant?
- 24. Clicked! Assume nothing. Clicked! Assume it is better than #2 and #3.
- 25. What's changed? We no longer have ratings, just document comparisons. Cost function - something that considers ordering, e.g., Kendall's T (number of concordant and discordant pairs) h is now a function of 2 parameters: h(d1, d2). But you can just do d2 - d1 and learn on that. d4 > d3 d4 > d2

Be the first to comment