Your SlideShare is downloading. ×
0
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Active Learning to Rank
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Active Learning to Rank

639

Published on

Presentation from ECIR 2013

Presentation from ECIR 2013

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
639
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Alexey VoropaevActive Learningto Rank
  • 2. 22Search Mail.Ru– www.go.mail.ru– Search for:• Russian web• Images• Video• etc.– 9% of market share1
  • 3. 3Machine learning is everywhere2There are ML algorithms in:– Crawler– Indexer– Ranker– Data mining systems– FrontendMost of them are supervised:– Require training set– Judgement is expensive– Ranker training set: 1M documents, 50K queries
  • 4. 4Usual problems of a training set3Points of class APoints of class BUnlabelled points
  • 5. 5Usual problems of a training set3Points of class APoints of class BUnlabelled pointsProblems:– Unlabelled points between classes
  • 6. 6Usual problems of a training set3Points of class APoints of class BUnlabelled pointsProblems:– Unlabelled points between classes– Imbalance of labelled points
  • 7. 7Usual problems of a training set3Points of class APoints of class BUnlabelled pointsProblems:– Unlabelled points between classes– Imbalance of labelled points– There is unsampled cluster
  • 8. 8Usual problems of a training set3Points of class APoints of class BUnlabelled pointsIdea of Active Learning:We fix these problems bysmart construction of training setWe save assessors resources.
  • 9. 9Uncertainty sampling4Take instances about which it is least certain how to label.Problem:- Requires posterior distribution P(Y|x)9Points of class APoints of class BUnlabelled points12Select point 1
  • 10. 10Query-By-Committee5Take instances with maximum different votes.10Points of class APoints of class BUnlabeled points12Select point 2
  • 11. 11QBag algorithmInput: T – initial labelled training setС – size of the committeeA – learning algorithmU – set of unlabeled objectsOutput: T – extended training set1. Uniformly resample T, obtain T1...TС, where |Ti| < |T|2. For each Tibuild model Miusing A3. Select x* = min x∈U| |Mi(x)=1| - |Mi(x)=0| |4. Pass x* to assessor and update T5. Repeat from 1until convergence6 K.Dwyer, R.Holte, Decision Tree Instability and Active Learning, 2007
  • 12. 12K.Dwyer, R.Holte, Decision Tree Instability and Active Learning, 2007QBag quality7
  • 13. Density sampling8Idea: Balance dense/sparse regions of the input spaceDense SparseNot sampled1313Points of class APoints of class BUnlabeled points
  • 14. 14Clustering of our training set for ranking9NavigationHigh relevantMedium relevantLow relevantIrrelevant404Self-organizing map- cell is cluster- color is relevance
  • 15. 15High densityLow densityDensity of the clustering10
  • 16. 16Old documentsNew documentsResult of sparse regions sampling11
  • 17. 17SOM-balancing algorithm121. Build clustering C for training set2. Compute average density densityavg3. For each cluster c ∈C4. If density(c) > densityavg5. Limit number of sample in c by N
  • 18. 18SOM-balancing results13Results:- Training set size: 350K documents- Map: 300x300 clusters, N=10- Compression: 18%- Quality:DCG Original: 17.20DCG Compressed: 17.26Problem:- Compression level is small
  • 19. 19SOM+QBag for learning to rank15Clustering for initial training set construction1. Build clustering using random sampling of documents2. Mark all clusters as unused3. Select query that covers maximum of unused clusters4. For each cluster covered by documents from query5. Select 1 document and send to assessor6. Mark the cluster as used7. Repeat from line 3 until select M queries
  • 20. 20SOM+QBag for learning to rank16Application of QBag1. Build committee of models for QBag2. Build clustering C for current training set3. Mark all clusters as unused4. For each query from a pool of new queries5. For each selected by QBag pair (d1, d2)6. c1= cluster(d1), c2= cluster(d2)7. If c1is unused OR c2is unused8. Send d1and d2to assessors9. Set c1and c2as used10. Set all clusters as unused
  • 21. 21SOM+QBag for learning to rank: results17All data: 300k documentsTest set: 300k docs
  • 22. 22Our search quality vs main competitors18
  • 23. 23Thank you!Reference:- http://active-learning.net/- Burr Settles. Active Learning.Synthesis Lectures on Artificial Intelligenceand Machine Learning, June 2012

×