Upcoming SlideShare
×

# Лекция "Архитектура поиска Яндекса"

559 views

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

Views
Total views
559
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
2
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Лекция "Архитектура поиска Яндекса"

1. 1. Ñîäåðæàíèå Èçìåðåíèå êà÷åñòâà ïîèñêà. Ïîñòðîåíèå ôóíêöèè ðàíæèðîâàíèÿ. Ïðîáëåìû â ïîñòðîåíèè ôóíêöèè ðàíæèðîâàíèÿ.
2. 2. Çàäà÷à êà÷åñòâà ïîèñêàÇàäà÷à. Ïîñòðîèòü ôóíêöèþ ðàíæèðîâàíèÿ, êîòîðàÿóïîðÿäî÷èâàåò äîêóìåíòû ïî ñòåïåíè èõ ñîîòâåòñòâèÿïîèñêîâîìó çàïðîñó.Äàíî:  Íàáîð çàïðîñîâ Q = {q1 , .., qn }  Íàáîð äîêóìåíòîâ äëÿ êàæäîãî çàïðîñà q → {d1 , .., dt }  Îöåíêè ðåëåâàíòíîñòè äëÿ ïàð <çàïðîñ, äîêóìåíò> (rel(q, d) ∈ [0; 1]) f (q, d)?
3. 3. Çàäà÷à êà÷åñòâà ïîèñêàÇàäà÷à. Ïîñòðîèòü ôóíêöèþ ðàíæèðîâàíèÿ, êîòîðàÿóïîðÿäî÷èâàåò äîêóìåíòû ïî ñòåïåíè èõ ñîîòâåòñòâèÿïîèñêîâîìó çàïðîñó.Äàíî:  Íàáîð çàïðîñîâ Q = {q1 , .., qn }  Íàáîð äîêóìåíòîâ äëÿ êàæäîãî çàïðîñà q → {d1 , .., dt }  Îöåíêè ðåëåâàíòíîñòè äëÿ ïàð <çàïðîñ, äîêóìåíò> (rel(q, d) ∈ [0; 1]) f (q, d)?
4. 4. Èçìåðåíèå êà÷åñòâà ïîèñêàÎöåíêà êà÷åñòâà ïîèñêà - óñðåäíåíèå ìåòðèêè êà÷åñòâà ïîíàáîðó çàïðîñîâ Q. M easure(rank f or qi ) Quality(f (q, d)) = n qi in QÏðèìåðû ìåòðèê êà÷åñòâà ïîèñêà:  Precision-10. ×èñëî äîêóìåíòîâ ñ ðåëåâàíòíîñòüþ áîëüøåé 0.5 â top − 10.
5. 5. Èçìåðåíèå êà÷åñòâà ïîèñêàÎöåíêà êà÷åñòâà ïîèñêà - óñðåäíåíèå ìåòðèêè êà÷åñòâà ïîíàáîðó çàïðîñîâ Q. M easure(rank f or qi ) Quality(f (q, d)) = n qi in QÏðèìåðû ìåòðèê êà÷åñòâà ïîèñêà:  Precision-10. ×èñëî äîêóìåíòîâ ñ ðåëåâàíòíîñòüþ áîëüøåé 0.5 â top − 10.
6. 6. Ïðèìåðû ìåòðèê DCG - Discounted cumulative gain Nq rel(qi , dj ) DCG(order f or qi ) = . 1 + log2 j j=1 nDCG - normalized Discounted cumulative gain DCG(rank f or qi ) nDCG(rank f or qi ) = . DCG(ideal rank f or qi )
7. 7. Ïîñòðîåíèå ôóíêöèè ðàíæèðîâàíèÿÄëÿ êàæäîé ïàðû <çàïðîñ, óðë> ðàññ÷èòûâàåòñÿ íàáîðôàêòîðîâ ñîîòâåòñâèÿ äîêóìåíòà çàïðîñó (q, d) → (f actor1 (q, d), .., f actor100500 (q, d)).  Còàòèñòèêà âñòðå÷àåìîñòè ñëîâ çàïðîñà â òåêñòå äîêóìåíòà - T R.  Ñòàòèñòèêà âñòðå÷àåìîcòè ñëîâ çàïðîñà â ññûëêàõ íà äîêóìåíò - LR.  Âåñ â ìîäåëè PageRank - P R.
8. 8. Ïîñòðîåíèå ôóíêöèè ðàíæèðîâàíèÿÔóíêöèÿ ðàíæèðîâàíèÿ - ôóíêöèÿ îò íàáîðà ôàêòîðîâ.Äîêóìåíòû äëÿ çàïðîñà óïîðÿäî÷èâàþòñÿ â ñîîòâåòñòâèè ñîçíà÷åíèåì ôóíêöèè ðàíæèðîâàíèÿ.Çàäà÷à ñîñòîèò â ïîñòðîåíèè ôóíêöèè f (q, d) ñ ìàêñèìàëüíûìçíà÷åíèåì ìåðû êà÷åñòâà good f (q, d) = arg max(Quality(f (q, d))).
9. 9. Ïîñòðîåíèå ôóíêöèè ðàíæèðîâàíèÿÔóíêöèÿ ðàíæèðîâàíèÿ - ôóíêöèÿ îò íàáîðà ôàêòîðîâ.Äîêóìåíòû äëÿ çàïðîñà óïîðÿäî÷èâàþòñÿ â ñîîòâåòñòâèè ñîçíà÷åíèåì ôóíêöèè ðàíæèðîâàíèÿ.Çàäà÷à ñîñòîèò â ïîñòðîåíèè ôóíêöèè f (q, d) ñ ìàêñèìàëüíûìçíà÷åíèåì ìåðû êà÷åñòâà good f (q, d) = arg max(Quality(f (q, d))).
10. 10. Ïðîáëåìû â ïîñòðîåíèè ôóíêöèè ðàíæèðîâàíèÿÔóíêöèÿ f (q, d) - ãëàäêàÿ ôóíêöèÿ íåêîòîðîãî íàáîðàïàðàìåòðîâ. f (q, d) = α1 · P R + α2 · T R · LR...DCG(...) - ðàçðûâíàÿ ôóíêöèÿ. Ïðèìåíåíèå îáû÷íûõãðàäèåíòíûõ ìåòîäîâ îïòèìèçàöèè íåâîçìîæíî.Íóæíî ñãëàäèòü ìåòðèêó êà÷åñòâà DCG(...).
11. 11. Ïðîáëåìû â ïîñòðîåíèè ôóíêöèè ðàíæèðîâàíèÿÔóíêöèÿ f (q, d) - ãëàäêàÿ ôóíêöèÿ íåêîòîðîãî íàáîðàïàðàìåòðîâ. f (q, d) = α1 · P R + α2 · T R · LR...DCG(...) - ðàçðûâíàÿ ôóíêöèÿ. Ïðèìåíåíèå îáû÷íûõãðàäèåíòíûõ ìåòîäîâ îïòèìèçàöèè íåâîçìîæíî.Íóæíî ñãëàäèòü ìåòðèêó êà÷åñòâà DCG(...).
12. 12. Ñïîñîáû ñãëàæèâàíèÿÄëÿ äîêóìåíòîâ çàïðîñà q ðàññ÷èòûâàþòcÿ çíà÷åíèÿ ôóíêöèèðàíæèðîâàíèÿ f (q, d) d1 → f l1 = f (q, d1 ), .., dt → f lt = f (q, dt ).Ïðåäïîëàãàåòñÿ, ÷òî äàííûå çíà÷åíèÿ (f l1 , .., f lt ) ïîðîæäàþòâåðîÿòíîñòíîå ðàñïðåäåëåíèå íà âñåõ ïåðåñòàíîâêàõ äîêóìåíòîâçàïðîñà St . Ñãëàæåííàÿ ìåòðèêà ðàññ÷èòûâàåòñÿ êàêìàòîæäàíèå ìåòðèêè DCG äëÿ äàííîãî ðàñïðåäåëåíèÿâåðîÿòíîñòåé; appDCG = DCG(ord) · P rob(ord|(f l1 , .., f lt )) ord in St
13. 13. Ôóíêöèè âåðîÿòíîñòåé Ìîäåëü Luce-Plackett. Àëãîðèòì ListNet-2007.Âåðîÿòíîñòü íåêîòîðîãî ïîðÿäêà äîêóìåíòîâ (di1 , .., dit ) t−1 f lij P rob((di1 , .., dit )|(f l1 , .., f lt )) = t . j=1 f lik k=jÍåäîñòàòêè: â ñóììå t! ñëàãàåìûõ, ÷òî äåëàåò âû÷èñëåíèå çàðàçóìíîå âðåìÿ "çàòðóäíèòåëüíûì".
14. 14. Ôóíêöèè âåðîÿòíîñòåé TieRank-2011. (À. Êóñòàðåâ, È. Ñåãàëîâè÷)Ïðåäïîëàãàåòñÿ, ÷òî ôèíàëüíûå çíà÷åíèÿ ôóíêöèèðàíæèðîâàíèÿ ìîãóò ïðèíèìàòü òîëüêî êîíå÷íûé íàáîðçíà÷åíèé 0 ≤ a1 < a2 < ..... < am ≤ 1.Åñëè çíà÷åíèå ôóíêöèè f (q, d) îòëè÷àåòñÿ îò ÷èñåë íàáîðà èïîïàäàåò â êàêîé-òî èíòåðâàë ìåæäó íèìè ai < f (q, d) < ai+1 ,òî çíà÷åíèå ôóíêöèè ðàíæèðîâàíèÿ äëÿ äîêóìåíòàïðèíèìàåòñÿ ðàâíûì ai ñ âåðîÿòíîñòüþ af (q,d) i , è ðàâíûì ai+1 i+1 −a ai+1 −f (q,d)ñ âåðîÿòíîñòüþ ai+1 −ai .
15. 15. Ôóíêöèè âåðîÿòíîñòåé TieRank-2011. (À. Êóñòàðåâ, È. Ñåãàëîâè÷)Äëÿ äàííîé ìîäåëè â ñóììå appDCG = DCG(ord) · P rob(ord|(f l1 , .., f lt )) ord in Stíå áîëåå ÷åì 2t ñëàãàåìûõ, ÷òî ñèëüíî ìåíüøå ÷åì t!.Ôóíêöèÿ ïîëó÷àåòñÿ íåïðåðûâíàÿ, íî ñ ðàçðûâíîéïðîèçâîäíîé. Äëÿ èñïðàâëåíèÿ ýòîãî íåäîñòàòêà èñïîëüçóåòñÿkernel ôóíêöèÿ f (q, d) f (q, d) φ(x) = 2x3 − 3x2 + 1, →φ . ai+1 − ai ai+1 − ai
16. 16. Ïîëó÷èëîñü !!!