3. Type ahead/Auto Complete
Definition
• It is a feature in which an application predicts the rest of the word or the
phrase the user is typing.
• Autocomplete speeds up human-computer interactions when it correctly
predicts the word/phrase user intends to enter after only a few
characters have been typed.
• In search engines, autocomplete provide users with suggested
queries/Keywords or results as they type their query in the search box.
This is also commonly called autosuggest.
https://en.wikipedia.org/wiki/Autocomplete
4. History of Auto suggest in Search
• Autocomplete(Search Engines) was created in 2004 by Google. By
Kevin Gibbs.
• It was as an opt-in feature till 2008. In 2008 made a default search
mode.
• Since then Other websites have incorporated Auto-Suggest in their
websites.
• Its now become a must-have feature in any Search Engines.
5. Developing a Search Auto Suggest System for
an E-Commerce Website.
Requirements:
1. To Develop a Auto Suggest System for search that lists the possible
phrase user is trying to type based on the few characters already
entered by the user. And ranks it based on the similarity to the user
query.
2. The suggestions displayed should also result in search results and
possibly have the products customer was looking for.
3. The goal is to save time for customers and guide them to the
product they are looking for faster with the accurate search phrase.
6. Common Implementation Challenges
• What Data sets to use for suggestions?
• Ranking suggestion based on user query
• Response Time.
• Spell Correction during indexing.
• Spell correction of user queries.
• Handling word Synonyms and word stemming (eg: car, cars, car’s => car)
• Unit Conversions(eg.
• 8 foot, 8 ft or 8’)
• Exception list. (List of Terms which are not allowed And Terms which are
added to the suggestion list always)
7. Data Set for Suggestions.
Data needed to build a Auto Suggest for an E-commerce website.
• Search Query Logs (Search history of all the search in a particular
period)
• Submitted Search Query and the Product viewed with it.
• Submitted Search Query and any product added to cart.
Product views and cart adds are used to calculate score for each suggestion.
8. Calculate index time score to boost the document while indexing.
• Each record from the query log is converted to the following format.
(query ,Score)
• The scores are aggregated.
• Documents are then added to the index.
• A index time boost =score is applied for each document.
Index time score/boost
9. Example: (sample query log data)
Aggregated score in (query,score) format
Search Query Term Product
id
Product view Cart adds
whirlpool refrigerator 100 1000 50
whirlpool refrigerator 101 500 5
whirlpool top load washer 200 100 10
Samsung top load washer 300 100 20
Top load washer 200 50 2
Top load washer 300 75 10
query score
whirlpool refrigerator 4025
whirlpool top load washer 200
Samsung top load washer 500
Top load washer 229
10. Index Design
Example
DocId Phrase prefix_1 word_1 prefi
x_2
word_2 prefix_3 Word_3 Prefix_
4
Word_4 Prefix_
other
Word_
other
Payload
1 Hammer
drill
cordless
H,
Ha,
Ham,
Hamm,
Hamme,
Hammer
hammer D,
Dr,
Dri,
Drill,
drill c,
co,
cor,
Cord,
Cordl,
Cordle,
Cordles,
Cordless
cordless <Payload
Data>
2 bosch
hammer
drill bit set
B,
Bo
Bos
Bosc
bosch
bosch H,
Ha,
Ham,
Ham
m,
Ham
me,
Ham
mer
hamme
r
D,
Dr,
Dri,
Drill
drill B,
Bi
bit
bit S
Se
set
set <Payload
Data>
11. Query
User entered query : hammer drill co
Query logic:
return phrase where (word_1=hammer AND word_2=drill AND prefix_3=co) ^ Boost=1000
OR (word_2=hammer AND word_3=drill AND prefix_4=co)^boost=10
DocId Phrase prefix_1 word_1 prefi
x_2
word_2 prefix_3 Word_3 Prefix_
4
Word_4 Prefix_
other
Word_
other
Payload
1 Hammer
drill
cordless
H,
Ha,
Ham,
Hamm,
Hamme,
Hammer
hammer D,
Dr,
Dri,
Drill,
drill c,
co,
cor,
Cord,
Cordl,
Cordle,
Cordles,
Cordless
cordless <Payload
Data>
…
13. Apache Lucene
• Fast , High performance, Scalable search/IR library written in java.
• Open Source
• Indexing and Searching
• Provides advanced search options like synonyms, stop words,
similarity, proximity.
• Provides Index time boosting of documents
• Provides Query time boost.
14. Additional Advanced features
• Seasonal based Auto-Suggestions
• Location sensitive Auto-Suggestion.
• Context based Auto-Suggest.
• Additional data from Product Catalog
• Product Titles
• Product Types
• Product Hierarchy.
• Brands
• Sku Ids
• Model numbers.
• Different Combination of Product Attributes, Brands, Product Types.
• Eg. dewalt 20 volt driver