Live search metrics for smart query optimization
Pavel Penchev
SEARCHHUB.IO
Self-learning search systems are a hot topic but they can only be as good as the data they get. Optimizing search sits at the core of what we do at SearchHub thus high quality data is a must. We aim to make search a KPI driven experience but finding a tool that can track important metrics such as MRR (mean reciprocal rank) proved to be a challenge. That's why we developed SearchHub Collector - a generic SDK providing search-specific live metrics that can be fed in real time into a self-optimizing system. We are publishing the code as open source. In this talk we want to share the rationale behind it and some of the learnings we gathered along the way.
Pavel is a long time search professional with more than 10 years experience at Fredhopper, a leading European search vendor, and works now at CXP Commerce Experts, a progressive search company from Pforzheim. Having seen the search world through different roles and perspectives, he love digging into the little details that combine to make a great search experience. Pavel has a background in software engineering, cloud systems and product management. He comes from Sofia, Bulgaria.
2. Guten Tag Damen und Herren
Damenmotorradlederhandschuhe
Warmduscher
Turnbeutelvergesser
Damenetuikleid
Weinkühlschrank
TM
3. 10+ years at Fredhopper
Seen Search from different perspectives
Now part of SearchHub@CXP
1
2
3
About me
4. Search is about helping people
find what they need. It‘s not
judged by some abstract
definitions of accuracy. It‘s
jugded by how much it helps
people.
7. Why no Search Analytics off-the-shelf?
NO REAL OFF-
THE-SHELF
OPTIONS
REQUIRES CUSTOMER
INVESTMENT
DISCONNECTED
FROM SEARCH
Different expressivity
Always requires post-processing
Lacks search-specific capabilities
Site changes & Communication
Access and setup
Alignment
GA / Econda / PIWIK are great
but lack important features
How to get multivariate query
analytics?
How to cluster, normalise and
aggregate query dependent
KPIs
8. SEARCH COLLECTOR
JavaScript SDK for Search Event collection
Open sourced under MIT license at https://github.com/searchhub/search-collector
9. Basic Architecture - Assemble Your Own Solution
Collect search specific events from search & navigation pages based on query selectors,
events and human interaction with the page
A search is not an isolated event, it happens within a context that carries a lot of useful data
points
COLLECTORS
CONTEXT RESOLVERS
Components for assembling pipelines for writing data in the desired format and endpoint
type
WRITERS
10. Sample integration code
var c = new SearchCollector.Collector({
“resolver” : new SearchCollector.SessionCookieResolver(),
“endpoint” : “https://sqs.aws…….amazonws.com"
}
c.add(new ProductClickCollector(“.product-item”, e => e.get(“id”));
c.add(new ImpressionCollector(“.product-item”, e => e.get(“id”));
c.start();
13. Behind the scenes
Buffer the events and group them before sending, necessary for
things like impressions, data is kept across page reloads
DELAYED WRITING
An impression consist of a product rendering entering the
viewport
IMPRESSION DETECTION
Handle not only the elements that are currently rendered on the
page but also dynamically inserted elements
DOM INSERTION DETECTION
The context of the search is of great importance, we resolve the
search and the filtering and attach it to each event
QUERY RESOLVING
Writing supported for REST interfaces, SQS and compatible
queues
SQS SUPPORT
Unit testing of event collection via Puppeteer headless browser
testing framework
HEADLESS TESTS
15. RESOURCES
The Search Collector source code, published under MIT license
https://github.com/searchhub/search-collector
Learn more about SearchHub
https://searchhub.io
Search Engine
Application
(Ecommerce Platform)
User
Analytics
16. THANK YOU
CXP Commerce Experts GmbH
Am Schoßgatter 3 / 75172 Pforzheim
info@commerce-experts.com
+49 (0)7231 203 676 5