Sigir2014勉強会 slideshare

【SIGIR2014勉強会】
Session 4: (I Can't Get No)
Satisfaction
担当：デンソーアイティーラボラトリ　山本
資料中の図は論文を引用しております。

発表論文
} (1) Context-Aware Web Search Abandonment Prediction
Yang Song, Xiaolin Shi, Ryen W. White, Ahmed Hassan(Microsoft Research)
} コンテキストを考慮したAbandonment query
(検索結果を一度もクリックしないクエリ)の予測
} (2) Impact of Response Latency on User Behavior in Web Search
Ioannis Arapakis, Xiao Bai and B. Barla Cambazoglu(yahoo lab)
} ウェブ検索における応答遅延のユーザ行動に対する影響
} (3) Towards Better Measurement of Attention and Satisfaction in
Mobile Search
Dmitry Lagun (Emory University), Chih-Hung Hsieh (Google), Dale Webster (Google), Vidhya Navalpakkam
(Google)
} モバイル検索における良い注視度及び満足度の計測手法
} (4) Modeling Action-level Satisfaction for Search Task
Satisfaction Prediction
Hongning Wang(Department of Computer Science University of Illinois at Urbana-Champaign),
Yang Song, Ming-Wei Chang, Xiaodong He, Ahmed Hassan, Ryen W. White(Microsoft Research)
} 連続する検索タスクにおける各アクションの満足度の推定
2
Session 4: (I Can't Get No) Satisfaction 担当：山本光穂

Impact of Response Latency on User Behavior in Web Search
Ioannis Arapakis, Xiao Bai and B. Barla Cambazoglu(yahoo lab)
} クエリーを入力しただけで、
検索結果を一度もクリックしない検索が存在する
} 一方で、最近の検索システムでは必ずしも検索結果
をクリックしなくてもユーザの満足度を満たす事が
可能(good abandonment query)
} 研究背景・課題
3
} 天気予報/地図/KG/スニペット
} この研究のコントリビューション
} bad abandonment/bad abandonmentの調査
} クリックレベル・セッションレベルの大規模データを観察する
¨ query/session length and inter-query time等に特徴がでる。
} structured learning frameworkを利用したabandonment predictionモデルの提案
} svmを利用
} 提案モデルを利用した検索システムの適合率向上手法の提案

Can Implicit User Metrics Indicate
Answer Relevance?
Towards Better Measurement of Attention and Satisfaction in Mobile Search
(Dmitry Lagun (Emory University), Chih-Hung Hsieh (Google), Dale Webster (Google), Vidhya Navalpakkam (Google)
• Page and Task metrics
– Time on SERP
– Number of Scrolls
– Time on Task
} mobile searchではKnowledge Graph
(天気・人物情報)を提示する。
} KGは見るだけで情報(答え)を得ることができる
→クリックログの入手が不可能
→Relevanceの評価が難しい。
– Time on Rich Result (and %)
– Total Time below Rich Result (and %)
– Time on Rich Result (and %)
– Total Time below Rich Result (and %)
} 調査内容
• Gaze Metrics
• Viewport Metrics
} Satisfaction of Rich Results
4
} KGを提示した/しない場合
KGがRelevant/Not Relevant
} Attention Measurement
} mobile eye trackerを利用してユーザの注視点を追跡
Knowledge Graph Result
8
User Study Details
• Participants
– 24 users (diverse background, age, occupation)
• Mobile Eye Tracker Setup
• Calibration Directly
on Phone Screen

Results Summary
Satisfaction with Attention Measurement
Rich Results
28
%Gaze Time
%Viewport Time
Viewport ≈ Gaze
(on mobile)
Pearson R = 0.69
Top half of the
screen receives
more Attention
“Short-Scroll”
effect
Granka et al., WWW 2004
Mobile Desktop
Relevant Not Relevant
KGがMore relevantresults
でな
い時はare よviewed り多くの
if
検索Answer 結果が閲is 覧Not
さ
れる
Relevant
ユーNo ザのImpact Satisfaction
on
にはUser KGSatisfaction
がRelevantで
なくwhen ても影KG 響is をNot
与え
ない
Relevant!
実験結果
5
ユーザが注視している点と
視線はほぼ一致する
画面の上側が
注視される傾向
がある
モバイルは
検索結果の一位が
注視点ではない!!!

Are attention patterns similar on
DeskTopでは注視点の時間はランキングに比例
ではmobileでdesktop は？
and mobile?
?
6 Granka et al., WWW 2004
?
21

【結論】
7
Viewing Time vs. Result Position
Granka et al., WWW 2004
On desktop:
Why?
22
【結論】ランキングに比例しない！！

理由: shot scroll effect
Short Scroll Effect
25
Session 4: (I Can't 8 Get No) Satisfaction 担当：山本光穂

Short Scroll Effect
25

} 従来の検索評価手法(クエリーベースの評価)は、与えられたクエリに対し
て如何に適合率が高いドキュメントを返せるかで評価
} 例えばタスクベースの評価等(e.g. ある研究テーマに対するサーベイ)には使
えない。
Search
task
sa+sfac+on
predic+onが必要
} この研究の特徴
} Search task satisfactionの推定に各アクションのsatisficationを推定
START END
11
Q1 Q2 Q3 Q4 Q5
D21
D24
D31 D51
D54
! !
! !
+
+
+
+ +
+

} 従来の検索評価手法(クエリーベースの評価)は、与えられたクエリに対し
て如何に適合率が高いドキュメントを返せるかで評価
} 例えばタスクベースの評価等(e.g. ある研究テーマに対するサーベイ)には使
えない。
Search
task
sa+sfac+on
predic+onが必要
} この研究の特徴
} Search task satisfactionの推定に各アクションのsatisficationを推定
START END
12
Q1 Q2 Q3 Q4 Q5
D21
D24
D31 D51
D54
! !
! !
+
+
+
+ +
+
■Problem
defini+on
Given
a
user
u's
search
task
t,
search-‐task
sa2sfac2on
is
a
binary
label
yt:
yt
=1,
if
the
user's
informa2on
need
has
been
met
and
thus
resul2ng
a
sa2sfying
search
task;
otherwise
yt=0

既存研究(Search task satisfactionの推定)
} Modeling task holistically [Feild et al. SIGIR'10, Kim
et al. WSDM’14]
} Binary classifier with expressive features for predicting
task-level satisfaction
!
!
} Modeling individual user’s search behavior [Hassan
et al. WSDM’10, Ageev et al. SIGIR'11]
} Markov model for sequential search behaviors
13
Detailed(ac*on-level(
sa*sfac*on(is(ignored(
No#discrimina,on#
between#sa,sfying#and#
unsa,sfying#ac,ons#

Rich
knowledge
conveyed
in
action-‐level
satisfaction
• Estimation
START END
Q1 Q2 Q3 Q4 Q5
+
D31 D51
@
Gold
Coast Get No) Satisfaction 担当：山本光穂
D21
D24
Session 4: (I Can't 14 SIGIR'2014
D54
of
URL
utility
[Georges
et
al.
WSDM’10]
– 適合文章の認識
• Estimation
of
query
quality
[White
et
al.
SIGIR’10]
– クエリ修正の発見
• Search
engine
performance
debugging
– 障害ターニングポイントの場所
-‐
-‐
-‐ +
-‐
+
-‐
+
+
2014/09/02
14

Modeling
Ac+on-‐level
Sa+sfac+on
for
Search
Task
y
Short&range+features:""
1. #clicks,"#queries,"last"ac2on"
2. Dwell"2me,"query:URL"
match,"domain"
Long&range+features:""
1. existSatQ,"allSatQ"
2. ac2on"transi2ons"
h1 h2 h3 hn ...
...
Start End
15
a1 a2 a3 an
Task%sa&sfac&on%
Ac&on%sa&sfac&on%(latent)%
Ac&ons%

Model training
• ωをSVMを利用し推定
!
!
!
!
• マージン　　　　　　
は推定値　　
正解値
との差分誤差

Task
satisfaction
prediction
evaluation
• Evaluation
SIGIR'2014
@
Gold
Coast
data
sets
– Toolbar
data
[Hassan
et
al.
CIKM’11]
• Explicit
ratings
of
satisfaction
from
actual
IE
users
– “Find
It
if
You
Can”
game
[Ageev,
et
al.
SIGIR’11]
• Controlled
experiment
with
editor-‐annotated
action
&
task
satisfaction
labels
– Search
log
data
• 4-‐month
Bing
search
log
#
User
#
Tasks Length
of
task SAT/DSAT
Toolbar
data 153 7306 5.2+/-‐6.6 6.84:1
Contest
data 156 1487 6.2+/-‐5.9 6.70:1
Search
Log 2.4M 7.7M 7.1+/-‐11.8 -‐
2014/09/02
17

Task-‐level
satisfaction
prediction
performance
• Toolbar
SIGIR'2014
@
Gold
Coast
data
set
Avg-‐ T T Accuracy
MML 0.707 0.897 0.518 0.830
LogiReg 0.740 0.918 0.563 0.861
Session-‐CRF 0.728 0.910 0.545 0.850
AcTS 0.761* 0.938* 0.584* 0.893*
AcTS 0.739 0.924 0.554 0.868
*
Indicates
p-‐value<0.01
Assumption:
action
satisfaction
=
task
satisfaction
2014/09/02
18
MML : Markov Model Likelihood
LogiReg :バイナリロジスティック回帰モデル
SEssion-CRF: : action-level satisfaction labels
equaled to the task-level label

Task-‐level
satisfaction
prediction
performance
• Contest
SIGIR'2014
@
Gold
Coast
data
set
Avg-‐ T T Accuracy
MML 0.658 0.901 0.414 0.831
LogiReg 0.682 0.930 0.435 0.875
Session-‐CRF 0.685 0.921 0.449 0.862
AcTS 0.701* 0.934 0.469* 0.882
AcTS 0.687 0.925 0.449 0.868
Labeled-‐AcTS 0.649 0.945 0.352 0.899
*
Indicates
p-‐value<0.01
Assumption:
action
satisfaction
=
task
satisfaction
With
editor’s
action-‐level
annotations
2014/09/02
19

Sigir2014勉強会 slideshare

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (18)

Similar to Sigir2014勉強会 slideshare

Similar to Sigir2014勉強会 slideshare (20)

Sigir2014勉強会 slideshare