6. 6
01
Ingredients for baking a relevance cake
Understand of what relevance means
Prepare the content
Control the scoring algorithm
A process & culture in the organization to structurally improve relevance
Domain experts
Judgement lists
User behavioral data
Entity detection
Classification
Vocabulary management
Construct the query
Single field / multi-field search
Term-centric vs field-centric
Grammar parsing
(External) query processing
Re-shape and tune the scoring algorithm:
• Customized similarity
• Function queries
Understand the EXPLAIN feedback:
• Querylog tool
Prepare the Solr configuration
How to tokenize
Balance recall vs precision via the analyzers
Feature modeling
Institutionalize engineering of relevance through the organization
Measure expectation vs reality via a diff with Golden Answers
Automated relevance tests
Work iterative
7. 01
Score (ClassicSimilarity) =
Multiplicative function query *
Multiplicative function query *
Multiplicative function query *
coord(q,d) *
queryNorm(q) *
(
Additive function query +
Additive function query +
Additive function query +
∑(for each term)
(
TermBoost(t) *
FieldBoost(d) *
TF(t in d) *
IDF(t)2 *
Norm(t,d) *
Payload(t)
)
)
Score (BM25Similarity) =
Multiplicative function query *
Multiplicative function query *
Multiplicative function query *
(
Additive function query +
Additive function query +
Additive function query +
∑(for each term)
(
TermBoost(t) *
FieldBoost(d) *
TF(t in d, Norm) *
IDF(t)2 *
Payload(t)
)
)
>= Solr 6.0< Solr 6.0
1/
√ 𝑠𝑢𝑚𝑂𝑓
𝑆𝑞𝑢𝑎𝑟𝑒𝑑
𝑊𝑒𝑖𝑔ℎ𝑡𝑠
𝑜𝑣𝑒𝑟
𝑙𝑎𝑝/
𝑀𝑎𝑥
𝑂𝑣𝑒
𝑟𝑙𝑎𝑝
√ 𝑓 𝑟𝑒𝑞/
(1+ 𝑑𝑖𝑠𝑡
𝑎𝑛𝑐𝑒)
1+ ln(
𝑛𝑢𝑚𝐷𝑜
𝑐𝑠/
𝑑𝑜𝑐𝐹𝑟𝑒
≈ 𝑖𝑛𝑑𝑒𝑥𝑏𝑜
𝑓 𝑟𝑒𝑞/
(1+ 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒)
∗( 𝑘1+1) / 𝑓 𝑟𝑒𝑞/
ln(
𝑛𝑢𝑚𝐷𝑜𝑐𝑠
− 𝑑𝑜𝑐𝐹𝑟𝑒𝑞
+0.5/
𝑑𝑜𝑐𝐹𝑟𝑒𝑞
+0.5 )
≈1/
𝑛𝑢𝑚
𝑇𝑒𝑟𝑚
8. 01
Create your own customizable Similarity class
MySimilarityFactory.java
MySimilarity.java
Make jar
Reference jar from solrconfig.xml
Extends SchemaSimilarityFactory
Creates MySimilarity instance. Passes parameters from schema
Extends BM25Similarity
Customizations to the algorithm happens here
Add on bottom schema.xml
<similarity class="solr.SchemaSimilarityFactory“ />
Add inside fieldType definition
<fieldType ...>
<analyzer > ... </analyzer>
<similarity class="com.mycompany.MySimilarityFactory">
<str name="tuneIDF">1.0</str> <!-- IDF -->
<str name="tuneSloppyFreq">0</str> <!-- slop -->
<str name="tunePayload">3</str> <!-- payloads -->
<str name="k1">1.5</str> <!-- TF -->
<str name="b">1.0</str> <!-- TF -->
</similarity>
</fieldType>
Delegate similarity to per-field setting
Set tuning parameters
13. 01
• Tune the impact of queryNorm and coord (in case of ClassicSimilarity)
Or just set them to a hard 1, like I did
• Tune the impact of distance on TF
In case of a phrase + slop query
• Tune the curve & mapping of Payloads
This is a hard 1 by default
• Tune the 255 entries in the Norm byte
Small range for small fields, big range for big fields => higher precision for each case
Other items you might want to make tunable via MySimilarity
14. 01
Score (ClassicSimilarity) =
Multiplicative function query *
Multiplicative function query *
Multiplicative function query *
coord(q,d) *
queryNorm(q) *
(
Additive function query +
Additive function query +
Additive function query +
∑(for each term)
(
TermBoost(t) *
FieldBoost(d) *
TF(t in d) *
IDF(t)2 *
Norm(t,d) *
Payload(t)
)
)
Score (BM25Similarity) =
Multiplicative function query *
Multiplicative function query *
Multiplicative function query *
(
Additive function query +
Additive function query +
Additive function query +
∑(for each term)
(
TermBoost(t) *
FieldBoost(d) *
TF(t in d, Norm) *
IDF(t)2 *
Payload(t)
)
)
>= Solr 6.0< Solr 6.0
i.e.
Ratings
Pagerank
Profitability
Popularity
Document type
Page views
Geo-distance
Freshness
i.e.
Implicit proximity
Exact title match
Shingle match
Author name in author field
Extending
the terms score
Amplifying
the terms score
15. 01
Function queries: index time boost
<field
name="docboost"
type="int"
indexed="false"
stored="false"
docValues="true"
/>
<add>
<doc>
<field
name="id">id1</field>
<field
name="body">Life
is
like
riding
a
bicycle.</field>
<field
name="docboost">2</field>
</doc>
<doc>
<field
name="id">id2</field>
<field
name="body">To
keep
your
balance,
you
must
keep
moving.</field>
<field
name="docboost">5</field>
</doc>
</add>
q=*:*&boost=field(docboost)
q=*:*&bf=field(docboost)
additive
multiplicative
doctype
docboost
Archived
0
Form
1
Pending-‐legisla?on
2
Case-‐law
3
Ruling
4
Annota?on
5
News
6
Prac?ce-‐aid
7
Regula?on
8
Law
9
Explana?ons
10
domain
expert
16. Tuning multiplicative boost?
tuneBoost = 0
Scale up or down by summing?
boost=sum(field(docboost),$tuneBoost)
Tuning additive boost
bf=product(field(docboost),$tuneBoost)
nope
bigger effect?
tuneBoost: -5 ß 0
Scale up or down by multiplying
smaller effect
tuneBoost: 0 à 5
but not consistently…
tuneBoost = 1
smaller effect
tuneBoost: 1 à 0.2
bigger effect
tuneBoost: 1 à 5
22. 01
After a certain high level of technical skill is achieved,
science and art tend to coalesce in esthetics, plasticity,
and form.
The greatest scientists are artists as well.
Albert Einstein