This document provides a framework for prioritizing onsite search problems and key performance indicators (KPIs) to measure for e-commerce search optimization. It recommends prioritizing fixing searches that yield no results, improving relevance of results, and reducing false positives. The most essential KPIs to measure include query latency, throughput, result relevance through click-through rates and NDCG scores. The document also provides tips for self-benchmarking search performance and examples of search performance benchmarks across nine e-commerce sites from various industries.
Human Factors of XR: Using Human Factors to Design XR Systems
What gets measured prepares for peak in ecommerce
1. 1
What gets measured…
P R E PA R I N G F O R P E A K I N E C O M M E R C E
Peter Curran
General Manager, Digital Commerce, Lucidworks
peter.curran@lucidworks.com (415) 378-9663
Updated: August 2020
P E T E R D R U C K E R
2. 2
Agenda
Priorities You can’t do everything at once. A framework for thinking about what to focus on first.
KPIs The ideal complement of key performance indicators and when/why to measure them.
Benchmarks Common, widely available analytics platform KPIs compared across several
companies.
Q&A Jump in whenever you want.
4. 4
How to find the biggest problems
(Admittedly overly simplistic but) helpful framework
relevant not relevant
In search, we can think of the
index as divided into relevant
results and irrelevant results
too.
However, it’s a little more
complicated because the user
can query for anything. E.g.
coffe, low acid coffee, kcups,
etc.
5. 5
How to find the biggest problems
(Admittedly overly simplistic but) helpful framework
relevant not relevant
True Positives False Positives
False Negatives True Negatives
6. 6
How to find the biggest problems
(Admittedly overly simplistic but) helpful framework
relevant not relevant
True Positive: good!
False Positives: bad
False Negatives: terrible
True Negatives: good!
good
bad
7. 7
How to prioritize search problems
How do we order our work on these problems?
relevant not relevant
First: fix searches that incorrectly yield nothing
Second: get relevant products in even if you over
recall
Third: order/rank relevant products first
Fourth: Fix UI / UX issues
Fifth: get more precise by eliminating false positives
Sixth: put products you prefer to sell at the top
Seventh: move from lexical to semantic
understanding
8. 8
Head Queries
Tail Queries
Balancing severity and frequency
Typical distribution of terms in a wide assortment
1. Break your traffic into tiers and give each tier a score
E.g. Head = 10, torso = 5, and tail = 1
E.g. Deciles by query volume scoring 1-10
2. Score severity 1-3, for example …
Nulls & under-recall = 3
Over-recall and sloppy ranking = 2
Sub-optimal business ranking = 1
10. 10
Tagging
Garbage in: garbage out Analytics are implemented not instantiated
They are not all the same.
What is a conversion for you?
Most calculated metrics make assumptions
Automated unit tests = essential
Closed-loop features use either:
Clickstream feeds
Custom signal JS
… but not “reports” in most cases
Mobile / App / Tablet / Desktop
12. 12
Categories & Most Essential KPIs
Non-Functional KPIs
Query Latency at 90th Percentile: The time the engine took, in ms, to return 90% of all queries according to logs.
Maximum Throughput: The number of queries the engine responded to, per second, at its peak, according to logs.
Non-Functional
KPIs
Query
Latency at
90th
Percentile
Maximum
Throughput
Build Quality
KPIs
NDCG
Regression
Post-Build
Latency &
Throughput
Regression
Search Interface
Quality KPIs
Search Mix
Concierge
Feature
Engagement
Relevancy
Quality KPIs
NDCG
Dashboard
Positional
Clickthru
Dashboard
13. 13
Categories & Most Essential KPIs
Build Quality KPIs: Because regressions are very easy in search.
NDCG Regression: Compares judgment lists that are highly likely to be objectively true with search results in an automated unit test.
Some NDCG score (established benchmark figure from 0-100) is required for the build to pass.
Post-Build Latency and Throughput Regression: Compares a benchmark latency figure (in ms) and throughput figure (in qps) from a
past major release to an automated test’s measurement of latency and throughput. Some use average latency, we recommend 90th
percentile. Use an actual production log file from a peak day and spoof it up for your throughput test. Your goal should be stability at
2x-4x observed peaks.
Non-Functional
KPIs
Query
Latency at
90th
Percentile
Maximum
Throughput
Build Quality
KPIs
NDCG
Regression
Post-Build
Latency &
Throughput
Regression
Search Interface
Quality KPIs
Search Mix
Concierge
Feature
Engagement
Relevancy
Quality KPIs
NDCG
Dashboard
Positional
Clickthru
Dashboard
14. 14
Search Mix: Shows the % of site visitors or sessions that use search features as opposed to browse only. You might break this up
into different kinds of search engagement – e.g. type ahead vs. search vs. advanced search vs. browse. You can also express this
as average search actions per session.
Concierge Feature Engagement: Shows which concierge features, and how much use of them, is required for successful and
unsuccessful search outcomes. Concierge features are things like sort, guided navigation, type ahead, query to query
recommendations, etc.
Non-Functional
KPIs
Query
Latency at
90th
Percentile
Maximum
Throughput
Build Quality
KPIs
NDCG
Regression
Post-Build
Latency &
Throughput
Regression
Search Interface
Quality KPIs
Search Mix
Concierge
Feature
Engagement
Relevancy
Quality KPIs
NDCG
Dashboard
Positional
Clickthru
Dashboard
Categories & Most Essential KPIs
Search Interface Quality KPIs
15. 15
Categories & Most Essential KPIs
Relevancy Quality KPIs: Your main focus.
NDCG Dashboard: Shows NDCG figure for search as a whole, for key queries, for query groups, key interfaces, or other units.
NDCG necessarily requires the maintenance of judgment lists which may be onerous and expensive.
Positional Clickthru, MRR: Typically shows the clickthrough rate for the first n positions or position groups/rows/pages. Like NDCG,
this metric can be broken down by query, interface, etc. May also be aggregated up into a global average position clickthru. This is
a useful metric if NDCG judgment lists are not available.
Non-Functional
KPIs
Query
Latency at
90th
Percentile
Maximum
Throughput
Build Quality
KPIs
NDCG
Regression
Post-Build
Latency &
Throughput
Regression
Search Interface
Quality KPIs
Search Mix
Concierge
Feature
Engagement
Relevancy
Quality KPIs
NDCG
Dashboard
Positional
Clickthru
Dashboard
16. 16
Some reports
Things we’ve seen and recommended, generally
Essential Useful Analytical
Zero Results % and Terms Search exit Top Keywords
Term frequency, clickthru, and conversion Financial outcomes by path Zero Results
Latency and Throughput Concierge feature engagement (broken out) High Bounce Rate Queries
Search conversion to cart, download,
contact, sale, etc.
Exit related to any automated re-writes or
intent classifiers
Top Retried Queries
Search mix Report on “silent” interventions – spell
correction, rewriting / entity extraction
Top Query Locations
Geographical, time of day, device, etc,
breakouts
Corrected Queries
Average searches per session Top Facets and Top Facet Value Selections
Top Sort Contexts
18. 18
1. Connect with us. We have stats for every industry and
can tell you where you stand. info-
apac@lucidworks.com
2. Know your tagging. It’s common that your search
metrics will be polluted by browse activity. Take this use
case, suppose someone enters your site, searches for
something, ignores the results, clicks on your navigation
and buys something. Will search get credit for the
conversion? Should it?
3. Self-benchmark. Compare your search to non-search
performance. Your search conversion rate should be a
multiple of your browse conversion rate. If the multiple is
lower than 2x, irrespective of industry, then there is a
problem. 3x, 4x, 5x are common multiples, even 9x.
4. Separate special use cases. Normal search, where
people enter a keyword and click a button is different
than a menu item that executes a search as part of, say,
a sale banner or link. Promotional searches like this will
naturally have much higher conversion rates and will
Tips for benchmarking
19. 19
1 year across 9 sites
Selected industries
Connect with us and we can share more!
20. 20
How Do Committees Invent,
1967
Conway’s Law
Any organization that
designs a system will
produce a design
whose structure is a
copy of the
organization’s
communication
structure