Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
you complete meAnne Veling – June 5th, 2012 – Berlin Buzzwords                 @anneveling
AGENDA•   9292.nl Public Transport Site•   Naive Address Autocompletion•   Field Inspection Semantic Autocompletion•   Con...
9292.NL•   Largest public transport site of The Netherlands•   1M travel advices per day!•   Complete new site by Q42     ...
DATA•   10M points     •   Train and metro stations     •   Bus stops     •   Places of Interest     •   Streets     •   S...
NAIVE IMPLEMENTATION•   One concatenated field in Lucene•   Tune tokenizer/analyzer•   Tune query analyzer•   Tune weights...
100%   80%quality          effort
FIELD INSPECTION•   Taking advantage of     •   Number of fields     •   Speed of Lucene•   Query Analysis•   For each ter...
etten             leur                 zeilcity?station?bus stop?street?            city:etten-leur          street:zeil
RESULTS•   Implemented in Scala•   Lucene RequestHandler in Solr•   Ajax front-end
TUNING•   Iterative Tuning•   Using real user inputs from production log files•   Regression Testing to track index/algori...
CONCLUSIONS•   Very positive feedback•   Iterative tuning based on actual user input from log files     •   Regression tes...
THANK YOU@anneveling
Smart Autocompl... with Solr
Smart Autocompl... with Solr
Smart Autocompl... with Solr
Smart Autocompl... with Solr
Smart Autocompl... with Solr
Smart Autocompl... with Solr
Smart Autocompl... with Solr
Upcoming SlideShare
Loading in …5
×

Smart Autocompl... with Solr

1,605 views

Published on

Automatic suggestion functionality on most websites are simply standard Solr fuzzy wildcard queries on a concatenated index. For the Dutch public transportation website 9292.nl we created a more contextual autocompletion function, that uses the different address fields of the source database to its advantage, which allows it to better understand what address a user means in this highly ambiguous data environment. We will show the methods used and explain how this may help bring autocompletion functionality to the next level for faceted indexes
This talk was given at Berlin Buzzwords 2012

Published in: Technology, Business
  • Be the first to comment

Smart Autocompl... with Solr

  1. 1. you complete meAnne Veling – June 5th, 2012 – Berlin Buzzwords @anneveling
  2. 2. AGENDA• 9292.nl Public Transport Site• Naive Address Autocompletion• Field Inspection Semantic Autocompletion• Conclusions
  3. 3. 9292.NL• Largest public transport site of The Netherlands• 1M travel advices per day!• Complete new site by Q42 • Linking to existing routing engine • Moving from multiple input boxes to one • Mobile applications for Windows, iPhone, Android
  4. 4. DATA• 10M points • Train and metro stations • Bus stops • Places of Interest • Streets • Street ranges • Addresses• Highly ambiguous • Streets / city names / POI • Spelling mistakes • No single order
  5. 5. NAIVE IMPLEMENTATION• One concatenated field in Lucene• Tune tokenizer/analyzer• Tune query analyzer• Tune weights• Syntax Only
  6. 6. 100% 80%quality effort
  7. 7. FIELD INSPECTION• Taking advantage of • Number of fields • Speed of Lucene• Query Analysis• For each term, query in all fields • Does it appear in that field? Count > 0? • Use that information to do semantic interpretation
  8. 8. etten leur zeilcity?station?bus stop?street? city:etten-leur street:zeil
  9. 9. RESULTS• Implemented in Scala• Lucene RequestHandler in Solr• Ajax front-end
  10. 10. TUNING• Iterative Tuning• Using real user inputs from production log files• Regression Testing to track index/algorithm changes over time • For how many test queries is the expected result • The top result? • In the top 5?
  11. 11. CONCLUSIONS• Very positive feedback• Iterative tuning based on actual user input from log files • Regression test• Lucene is fast • Entire type-ahead still within 40ms• But: partner currently evaluating naive-only approach • sometimes good enough is good enough• Field Inspection will allow high quality selection • With fallback to naive syntactic search
  12. 12. THANK YOU@anneveling

×