Лекция "Архитектура поиска Яндекса"
Upcoming SlideShare
Loading in...5
×
 

Лекция "Архитектура поиска Яндекса"

on

  • 517 views

 

Statistics

Views

Total Views
517
Views on SlideShare
517
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Лекция "Архитектура поиска Яндекса" Лекция "Архитектура поиска Яндекса" Presentation Transcript

  • Ñîäåðæàíèåˆ Èçìåðåíèå êà÷åñòâà ïîèñêà.ˆ Ïîñòðîåíèå ôóíêöèè ðàíæèðîâàíèÿ.ˆ Ïðîáëåìû â ïîñòðîåíèè ôóíêöèè ðàíæèðîâàíèÿ.
  • Çàäà÷à êà÷åñòâà ïîèñêàÇàäà÷à. Ïîñòðîèòü ôóíêöèþ ðàíæèðîâàíèÿ, êîòîðàÿóïîðÿäî÷èâàåò äîêóìåíòû ïî ñòåïåíè èõ ñîîòâåòñòâèÿïîèñêîâîìó çàïðîñó.Äàíî: ˆ Íàáîð çàïðîñîâ Q = {q1 , .., qn } ˆ Íàáîð äîêóìåíòîâ äëÿ êàæäîãî çàïðîñà q → {d1 , .., dt } ˆ Îöåíêè ðåëåâàíòíîñòè äëÿ ïàð <çàïðîñ, äîêóìåíò> (rel(q, d) ∈ [0; 1]) f (q, d)?
  • Çàäà÷à êà÷åñòâà ïîèñêàÇàäà÷à. Ïîñòðîèòü ôóíêöèþ ðàíæèðîâàíèÿ, êîòîðàÿóïîðÿäî÷èâàåò äîêóìåíòû ïî ñòåïåíè èõ ñîîòâåòñòâèÿïîèñêîâîìó çàïðîñó.Äàíî: ˆ Íàáîð çàïðîñîâ Q = {q1 , .., qn } ˆ Íàáîð äîêóìåíòîâ äëÿ êàæäîãî çàïðîñà q → {d1 , .., dt } ˆ Îöåíêè ðåëåâàíòíîñòè äëÿ ïàð <çàïðîñ, äîêóìåíò> (rel(q, d) ∈ [0; 1]) f (q, d)?
  • Èçìåðåíèå êà÷åñòâà ïîèñêàÎöåíêà êà÷åñòâà ïîèñêà - óñðåäíåíèå ìåòðèêè êà÷åñòâà ïîíàáîðó çàïðîñîâ Q. M easure(rank f or qi ) Quality(f (q, d)) = n qi in QÏðèìåðû ìåòðèê êà÷åñòâà ïîèñêà: ˆ Precision-10. ×èñëî äîêóìåíòîâ ñ ðåëåâàíòíîñòüþ áîëüøåé 0.5 â top − 10.
  • Èçìåðåíèå êà÷åñòâà ïîèñêàÎöåíêà êà÷åñòâà ïîèñêà - óñðåäíåíèå ìåòðèêè êà÷åñòâà ïîíàáîðó çàïðîñîâ Q. M easure(rank f or qi ) Quality(f (q, d)) = n qi in QÏðèìåðû ìåòðèê êà÷åñòâà ïîèñêà: ˆ Precision-10. ×èñëî äîêóìåíòîâ ñ ðåëåâàíòíîñòüþ áîëüøåé 0.5 â top − 10.
  • Ïðèìåðû ìåòðèêˆ DCG - Discounted cumulative gain Nq rel(qi , dj ) DCG(order f or qi ) = . 1 + log2 j j=1ˆ nDCG - normalized Discounted cumulative gain DCG(rank f or qi ) nDCG(rank f or qi ) = . DCG(ideal rank f or qi )
  • Ïîñòðîåíèå ôóíêöèè ðàíæèðîâàíèÿÄëÿ êàæäîé ïàðû <çàïðîñ, óðë> ðàññ÷èòûâàåòñÿ íàáîðôàêòîðîâ ñîîòâåòñâèÿ äîêóìåíòà çàïðîñó (q, d) → (f actor1 (q, d), .., f actor100500 (q, d)). ˆ Còàòèñòèêà âñòðå÷àåìîñòè ñëîâ çàïðîñà â òåêñòå äîêóìåíòà - T R. ˆ Ñòàòèñòèêà âñòðå÷àåìîcòè ñëîâ çàïðîñà â ññûëêàõ íà äîêóìåíò - LR. ˆ Âåñ â ìîäåëè PageRank - P R.
  • Ïîñòðîåíèå ôóíêöèè ðàíæèðîâàíèÿÔóíêöèÿ ðàíæèðîâàíèÿ - ôóíêöèÿ îò íàáîðà ôàêòîðîâ.Äîêóìåíòû äëÿ çàïðîñà óïîðÿäî÷èâàþòñÿ â ñîîòâåòñòâèè ñîçíà÷åíèåì ôóíêöèè ðàíæèðîâàíèÿ.Çàäà÷à ñîñòîèò â ïîñòðîåíèè ôóíêöèè f (q, d) ñ ìàêñèìàëüíûìçíà÷åíèåì ìåðû êà÷åñòâà good f (q, d) = arg max(Quality(f (q, d))).
  • Ïîñòðîåíèå ôóíêöèè ðàíæèðîâàíèÿÔóíêöèÿ ðàíæèðîâàíèÿ - ôóíêöèÿ îò íàáîðà ôàêòîðîâ.Äîêóìåíòû äëÿ çàïðîñà óïîðÿäî÷èâàþòñÿ â ñîîòâåòñòâèè ñîçíà÷åíèåì ôóíêöèè ðàíæèðîâàíèÿ.Çàäà÷à ñîñòîèò â ïîñòðîåíèè ôóíêöèè f (q, d) ñ ìàêñèìàëüíûìçíà÷åíèåì ìåðû êà÷åñòâà good f (q, d) = arg max(Quality(f (q, d))).
  • Ïðîáëåìû â ïîñòðîåíèè ôóíêöèè ðàíæèðîâàíèÿÔóíêöèÿ f (q, d) - ãëàäêàÿ ôóíêöèÿ íåêîòîðîãî íàáîðàïàðàìåòðîâ. f (q, d) = α1 · P R + α2 · T R · LR...DCG(...) - ðàçðûâíàÿ ôóíêöèÿ. Ïðèìåíåíèå îáû÷íûõãðàäèåíòíûõ ìåòîäîâ îïòèìèçàöèè íåâîçìîæíî.Íóæíî ñãëàäèòü ìåòðèêó êà÷åñòâà DCG(...).
  • Ïðîáëåìû â ïîñòðîåíèè ôóíêöèè ðàíæèðîâàíèÿÔóíêöèÿ f (q, d) - ãëàäêàÿ ôóíêöèÿ íåêîòîðîãî íàáîðàïàðàìåòðîâ. f (q, d) = α1 · P R + α2 · T R · LR...DCG(...) - ðàçðûâíàÿ ôóíêöèÿ. Ïðèìåíåíèå îáû÷íûõãðàäèåíòíûõ ìåòîäîâ îïòèìèçàöèè íåâîçìîæíî.Íóæíî ñãëàäèòü ìåòðèêó êà÷åñòâà DCG(...).
  • Ñïîñîáû ñãëàæèâàíèÿÄëÿ äîêóìåíòîâ çàïðîñà q ðàññ÷èòûâàþòcÿ çíà÷åíèÿ ôóíêöèèðàíæèðîâàíèÿ f (q, d) d1 → f l1 = f (q, d1 ), .., dt → f lt = f (q, dt ).Ïðåäïîëàãàåòñÿ, ÷òî äàííûå çíà÷åíèÿ (f l1 , .., f lt ) ïîðîæäàþòâåðîÿòíîñòíîå ðàñïðåäåëåíèå íà âñåõ ïåðåñòàíîâêàõ äîêóìåíòîâçàïðîñà St . Ñãëàæåííàÿ ìåòðèêà ðàññ÷èòûâàåòñÿ êàêìàòîæäàíèå ìåòðèêè DCG äëÿ äàííîãî ðàñïðåäåëåíèÿâåðîÿòíîñòåé; appDCG = DCG(ord) · P rob(ord|(f l1 , .., f lt )) ord in St
  • Ôóíêöèè âåðîÿòíîñòåé Ìîäåëü Luce-Plackett. Àëãîðèòì ListNet-2007.Âåðîÿòíîñòü íåêîòîðîãî ïîðÿäêà äîêóìåíòîâ (di1 , .., dit ) t−1 f lij P rob((di1 , .., dit )|(f l1 , .., f lt )) = t . j=1 f lik k=jÍåäîñòàòêè: â ñóììå t! ñëàãàåìûõ, ÷òî äåëàåò âû÷èñëåíèå çàðàçóìíîå âðåìÿ "çàòðóäíèòåëüíûì".
  • Ôóíêöèè âåðîÿòíîñòåé TieRank-2011. (À. Êóñòàðåâ, È. Ñåãàëîâè÷)Ïðåäïîëàãàåòñÿ, ÷òî ôèíàëüíûå çíà÷åíèÿ ôóíêöèèðàíæèðîâàíèÿ ìîãóò ïðèíèìàòü òîëüêî êîíå÷íûé íàáîðçíà÷åíèé 0 ≤ a1 < a2 < ..... < am ≤ 1.Åñëè çíà÷åíèå ôóíêöèè f (q, d) îòëè÷àåòñÿ îò ÷èñåë íàáîðà èïîïàäàåò â êàêîé-òî èíòåðâàë ìåæäó íèìè ai < f (q, d) < ai+1 ,òî çíà÷åíèå ôóíêöèè ðàíæèðîâàíèÿ äëÿ äîêóìåíòàïðèíèìàåòñÿ ðàâíûì ai ñ âåðîÿòíîñòüþ af (q,d) i , è ðàâíûì ai+1 i+1 −a ai+1 −f (q,d)ñ âåðîÿòíîñòüþ ai+1 −ai .
  • Ôóíêöèè âåðîÿòíîñòåé TieRank-2011. (À. Êóñòàðåâ, È. Ñåãàëîâè÷)Äëÿ äàííîé ìîäåëè â ñóììå appDCG = DCG(ord) · P rob(ord|(f l1 , .., f lt )) ord in Stíå áîëåå ÷åì 2t ñëàãàåìûõ, ÷òî ñèëüíî ìåíüøå ÷åì t!.Ôóíêöèÿ ïîëó÷àåòñÿ íåïðåðûâíàÿ, íî ñ ðàçðûâíîéïðîèçâîäíîé. Äëÿ èñïðàâëåíèÿ ýòîãî íåäîñòàòêà èñïîëüçóåòñÿkernel ôóíêöèÿ f (q, d) f (q, d) φ(x) = 2x3 − 3x2 + 1, →φ . ai+1 − ai ai+1 − ai
  • Ïîëó÷èëîñü !!!