Rank by time or by relevance - Revisiting Email Search

Rank by Time or by Relevance?
Revisiting Email Search
November17th, 2015
David Carmel Guy Halawi Liane Lewin-Eytan Yoelle Maarek Ariel Raviv
Haifa Labs

Motivation
▪  “Email search still remains difficult, time-consuming and
frustrating" (Elsweiler et al. 2011)
▪  By default, all existing Web mail services display search results
in reverse chronological order
▪  makes the discovery of older messages very hard
▪  Imposes strict constraints for messages matching

Email Search Today (Time ordered)
Searching for an (old) application form for “Visa to India”

Search in Yahoo Mail
▪  Boolean Search model
•  Each query is a Boolean expression (AND, OR, NOT)
•  Generally, all query terms must appear in at-least one of the
message fields (AND operation)
▪  Ranking
•  Default: by Recency (Reverse Chronological ordering)
•  (pseudo)-Relevance – implementation is based on matching
query terms
▪  almost never used by users

Challenge
▪  Challenge the traditional prevalent chronological ranking in Web email
search
›  investigate whether an email-specific relevance ranking could bring any value to our
users
▪  Introduce mail-specific relevance ranking consisting of two phases:
›  Relaxed matching phase to improve recall
›  Comprehensive ranking phase using a rich set of mail-specific features to improve
precision

▪ Very short queries: 1.5 terms on avg.
› Re-find Intent
– looking for specific previous message
› Contact queries ~40%
– Picture, Email address, Phone number,
Physical address, Links, Attachments,
Appointments (time/date), Conversation
▪ Tasks involved:
› Couponing (Pizza coupon?)
› Tracking Items (bill paid, package shipped)
› Looking up Account / Registration info
› Social media (searching for comments/posts)
Email Queries: What people search for?

▪  Standard two-phase retrieval process:
›  First Phase: Retrieve a pool of message qualified as potentially relevant
to the query
•  Two matching models:
–  Restricted (AND mode)
–  Relaxed: any message containing at least one of the query terms in any
of its fields is considered a match
›  Second phase:
•  Ranks these messages using a rich set of features
–  Scores messages by linear regression analysis learned using Learning-to-rank
approach
The Search Process

REX - Relevance EXtended Ranking Model
Based on an LTR framework using
several sets of features:
▪  Message
▪  Recipient
▪  Sender
▪  Message-Query Similarity

Message Features
▪  Freshness exponential decay over the message age
▪  User Actions replied, forwarded, flagged, drafted, read,..
▪  Attachment has attachment, attachment type / size
▪  Folder folder type (inbox, draft, sent, user defined folder)
▪  Exchange Type reply/forward, in-thread

Recipient Features
▪  To recipient mentioned in To
▪  Cc recipient mentioned in Cc
▪  In Group recipient was not mentioned explicitly

Sender Features
▪  User-sender connection correspondence volume / type
▪  Self correspondence sender is user
Vertical
▪  Sender inbound / outbound traffic volume and ratio
▪  Sender urls usage volume and ratio in messages
▪  Sender recipients number avg. per message
▪  Sender recipients actions ratio over messages
Horizontal

Message-Query Similarity Features
▪  BM25f textual similarity between a query and the entire message
•  Considering query term distribution over message fields (Subject, From, To, Body,
Attachment)
●  TF-IDF measures the (tf-idf) similarity of each message field
independently of the others
●  Coord fraction of query terms that occur in the message

Proximity
Taking into account proximity between query terms in content
▪  Neighborhood boosting consecutive matches
▪  Proximity boosting tokens found closely in a fixed
window (5) with no ordering
▪  Prefix allowing prefix match but with score decay using
length difference

Learning to Rank (LTR)
▪  Data point: < query | ~100 matched messages | klicked message >
▪  Datasets:
›  Corporate 100K random queries from the corporate query log
›  Web-mail 10K random queries
›  Editorial 500 queries judged by editors
▪  LTR Algorithm AROW (Crammer et al. 2013)

d1
d2
d3
d4 ∑wi fi(d4)
∑wi fi(d1)
>∑wi fi(d2)
∑wi fi(d3)
d1
d2
d4
d3
∑wi fi(d4) ∑wi fi(d1)
> ∑wi fi(d2)∑wi fi(d4)
update w
Learning to Rank (LTR)

Performance Measures
▪  Mean reciprocal rank (MRR)
corresponds to the harmonic mean of
the ranks of the relevant documents
▪  Success@K
The number of queries for which the clicked message is
found in the top-k results
▪  NDCG@K
when we have several relevance feedback levels

Time Vs. REX (Corporate Dataset)
Algorithm MRR (+lift %)
Time 0.3722
REX (fresh. + sim.) 0.4261 (+14.48%)
REX (fresh. + sim. + actions) 0.4550 (+22.24%)
REX (fresh. + sim. + actions + sender) 0.4548 (+22.19%)

Time Vs. REX (Web Mail Dataset)
Algorithm MRR (+lift %)
Time 0.3717
REX (fresh. + sim.) 0.3785 (+1.81%)
REX (fresh. + sim. + actions) 0.4238 (+14%)
REX (fresh. + sim. + actions + sender) 0.4258 (+14.55%)

Time vs REX (as a function of the Result set size)
Relative improvement of REX over Time increases as more
messages in the user inbox match the query

Relative Feature Importance
▪  In general, REX ranker significantly outperforms Chronological ranker
›  both in the Corporate and in the Web datasets
▪  Relative Feature Importance:
Freshness >> User actions >> Similarity >> Sender features
›  Freshness:
•  Years >> Months >> Weeks >> Days
›  User actions
•  Read >> Forwarded >> Flagged >> Replied >> Draft >> Ham >> Spam
›  Similarity:
•  coord >> tf-idf (From > Subject > Body >Attachment > To) >> BM25f
▪  Low significance of the sender features ??

Time Vs. REX (Editorial Dataset)
Algorithm MRR (+lift %) NDCG@10 (+lift %)
Time 0.3629 0.4936
REX 0.5105 (+40.65%) 0.6647 (+34.66%)
Query Intent
Algo A
Most Relevant
Algo A
Related
Algo B
Most Relevant
Algo B
Related
Lila dress Discussion about dress for party 2 3,5,7 4 1,2,6,7
Spense KE Schedule for Spense KE meeting 5 2,4,8,9 1 3,4,5

Editors Feedback
“... Sometimes, I had the feeling that Algo. B was
really reading my mind to put in the first place
exactly the email message I was thinking of ...”
“...Today, after I ran it again, it was not that much
impressive, but still I have the feeling it was the
type of search that gave me the best results..."

Email Search Tomorrow (REX ordered)
Searching for an (old) application form for “Visa to India”

Conclusions
●  We Challenged the traditional chronological sort for email search
o  While freshness is still super important, it should be integrated into the
relevance model with many other important features
o  REX performs significantly better from Time-based ranking
o  The model can be easily expanded considering more signals as they become
available
●  Are mail users ready to depart from chronological sort in favor of
modern relevance ranking?
●  Time will tell
●  REX provides our users the opportunity, at least
●  More details can be found in our CIKM 2015 paper:
o  Rank by Time or by Relevance? Revisiting Email Search

Future Work
▪  Enriching the the set of Ranking features
›  Solving the mystery:
•  how come that Sender features do not contribute to the ranking
›  Adding Query based features Based on Query Intent Analysis
▪  Personalization
›  Adding the User into the ranking model
▪  User Study
›  Better understanding user needs
•  how users search over their mailboxes

Yahoo new Mobile Mail application

Rank by time or by relevance - Revisiting Email Search

Recommended

Recommended

More Related Content

Similar to Rank by time or by relevance - Revisiting Email Search

Similar to Rank by time or by relevance - Revisiting Email Search (20)

Recently uploaded

Recently uploaded (20)

Rank by time or by relevance - Revisiting Email Search