This document discusses different methods for sorting facet results in Solr. It begins with an overview of facets and built-in sorting methods like name and count sorting. It then explores adding relevancy by sorting on static scores or dynamically calculated scores. The document considers approaches like using a custom collector, API wrapper, or blended scoring from multiple data sources. The preferred approach is a blended method using machine learning to combine facet scores with user feedback data for the most relevant sorting.
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Smart Facets at Rakuten: Presented by Keith Thoma & Michael Pellegrini, Rakuten USA
1. O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
2. Smart Facets @ Rakuten
Keith Thoma & Michael Pellegrini
Rakuten USA
3. 33
About Rakuten
• Founded
in
1997
in
Japan
• Operates
Rakuten
Ichiba,
the
largest
e-‐commerce
site
in
Japan
• One
of
the
15
largest
internet
companies
in
the
world
• 10,000+
employees
worldwide
• $6.3
billion
in
revenue
in
FY2015
6. 66
Solr at Rakuten
• 30+ Services within the Rakuten group using Solr
• Solr supported in 10+ languages
• At Rakuten.com
• Supported via Solr
• Over 30 million products and 90 million different items
• Thousands of unique categories and attributes to search against
• Millions of queries a day!
9. 99
Facet Sorting Criteria
9
• Top
facets
are
relevant
to
query
• Top
facet
order
reflects
relevancy
• Easy
to
maintain
over
Mme
• Acceptable
latency
in
producMon
10. 1010
Latency Impact Of Facets
10
• Facets
are
expensive
•
Extra
logic
can
be
performance
hit
•
In
some
cases,
facets
can
slow
down
queries
by
10x
•
OOMs
in
extreme
cases
12. 12
Brands
ORerbox
Incipio
Apple
AAA
Phone
Cases
Assume
we
have
the
following
brand
facets
for
the
query
iPhone
6
Cases
Example - iPhone 6 Cases
AAA
13. 1313
Search Results for iPhone 6 Cases
13
1
2
3
4
5
O>erbox
iPhone
Case
Brand:
ORerbox
O>erbox
iPhone
Case
Brand:
ORerbox
Generic
iPhone
Case
Brand:
Incipio
Generic
iPhone
Case
Brand:
Incipio
Generic
iPhone
Case
Brand:
Incipio
6
7
8
9
10
iPhone
6s
+
Case
Brand:
Apple
Off-‐Brand
iPhone
Case
Brand:
AAA
Phone
Cases
Off-‐Brand
iPhone
Case
Brand:
AAA
Phone
Cases
Off-‐Brand
iPhone
Case
Brand:
AAA
Phone
Cases
Off-‐Brand
iPhone
Case
Brand:
AAA
Phone
Cases
AAA AAA AAA AAA
14. 1414
Default Facet Sorting Methods
14
Sort
based
on
alphabeMcal
order
of
facet
values
Name Sort Count Sort
Sort
based
on
result
count
per
facet
value
Let’s
see
how
they
do
15. 15
Name Sort Count Sort
Brands
AAA
Phone
Cases
Apple
Incipio
ORerbox
Brands
AAA
Phone
Cases
Count:
4
Incipio
Count:
3
ORerbox
Count:
2
Apple
Count:
1
AAA AAA
16. 1616
JSON Facets
16
• We
can
sort
on
a
value
associated
with
a
facet
• Values
must
be
wriRen
to
an
indexed
field
• Let’s
add
a
staMc
score
to
the
mix
and
sort
on
that!
17. 17
Search results for iPhone 6 Case
1
2
3
4
5
O>erbox
iPhone
Case
Brand:
ORerbox
Score:
30
O>erbox
iPhone
Case
Brand:
ORerbox
Score:
30
Generic
iPhone
Case
Brand:
Incipio
Score:
20
Generic
iPhone
Case
Brand:
Incipio
Score:
20
Generic
iPhone
Case
Brand:
Incipio
Score:
20
6
7
8
9
10
iPhone
6s
+
Case
Brand:
Apple
Score:
100
Off-‐Brand
iPhone
Case
Brand:
AAA
Phone
Cases
Score:
1
Off-‐Brand
iPhone
Case
Brand:
AAA
Phone
Cases
Score:
1
Off-‐Brand
iPhone
Case
Brand:
AAA
Phone
Cases
Score:
1
Off-‐Brand
iPhone
Case
Brand:
AAA
Phone
Cases
Score:
1
AAA AAA AAA AAA
19. 19
Name
Count
Sta1c
Score
19
Results – Built-In Sorting Methods
19
• Top
facets
are
relevant
to
query
• Top
facet
order
reflects
relevancy
• Easy
to
maintain
over
Mme
• Acceptable
latency
in
producMon
21. 2121
Score Sort
21
• Try
sorMng
on
score:
• msg: "undefined field: "score"”,
org.apache.solr.common.SolrException: undefined field: "score"
at
org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:
1231)
• Not
supported
out
of
box
• How
could
we
add
support
for
this?
22. 2222
Custom Collector Logic
22
• Could
be
implemented
via
a
custom
collector
• Would
alter
select
facets
• Would
require
extra
effort
when
performing
Solr
upgrades
• Could
have
a
negaMve
performance
impact
• Might
need
addiMonal
logic
to
support
grouping/collapsing
23. 23
23
API Wrapper
• Run
an
API
wrapper
around
Solr
• Re-‐sort
facets
in
wrapper
• Easy
to
add
custom
business
rules
24. 24 24
Score Sort
24
Is
this
the
best
sort
order?
Brands
ORerbox
Incipio
Apple
AAA
Phone
Cases
AAA
25. 2525
Blended Approach
25
• Use
both
result
scores
and
user
data
• Use
machine
learning
to
blend
the
scores
together
ORerbox
30
User
Clicks
Incipio
50
User
Clicks
Apple
10
User
Clicks
AAA
Phone
Cases
1
User
Click
AAA
28. 2828
Impact of API Wrapper
28
• Coverage
of
significant
user
queries
• Can
be
used
with
grouping
• Most
calculaMons
are
done
offline
• No
major
impact
on
search
latency
• 99%
response
Mme
impact
of
less
than
5
ms
29. 29
Score
–
Custom
Collector
Score
–
API
Wrapper
Blended
29
Results – Relevancy-Based Sorting Methods
29
• Top
facets
are
relevant
to
query
• Top
facet
order
reflects
relevancy
• Easy
to
maintain
over
Mme
• Acceptable
latency
in
producMon
31. 31
Conclusions
• Built-‐in
facet
sorMng
methods
are
not
always
opMmal
for
relevancy
• SorMng
facets
based
on
result
score
can
improve
relevancy
• IntegraMng
external
signals
(such
as
user
data)
makes
the
soluMon
more
robust