you complete me

Anne Veling – June 5th, 2012 – Berlin Buzzwords
                 @anneveling
AGENDA
•   9292.nl Public Transport Site
•   Naive Address Autocompletion
•   Field Inspection Semantic Autocompletion
•   Conclusions
9292.NL
•   Largest public transport site of The Netherlands
•   1M travel advices per day!
•   Complete new site by Q42
     •   Linking to existing routing engine
     •   Moving from multiple input boxes to one
     •   Mobile applications for Windows, iPhone, Android
DATA
•   10M points
     •   Train and metro stations
     •   Bus stops
     •   Places of Interest
     •   Streets
     •   Street ranges
     •   Addresses
•   Highly ambiguous
     •   Streets / city names / POI
     •   Spelling mistakes
     •   No single order
NAIVE IMPLEMENTATION
•   One concatenated field in Lucene
•   Tune tokenizer/analyzer
•   Tune query analyzer
•   Tune weights


•   Syntax Only
100%



   80%




quality




          effort
FIELD INSPECTION
•   Taking advantage of
     •   Number of fields
     •   Speed of Lucene
•   Query Analysis
•   For each term, query in all fields
     •   Does it appear in that field? Count > 0?
     •   Use that information to do semantic interpretation
etten             leur                 zeil
city?
station?
bus stop?
street?




            city:etten-leur          street:zeil
RESULTS
•   Implemented in Scala
•   Lucene RequestHandler in Solr
•   Ajax front-end
TUNING
•   Iterative Tuning
•   Using real user inputs from production log files
•   Regression Testing to track index/algorithm changes over time
     •   For how many test queries is the expected result
          • The top result?
          • In the top 5?
CONCLUSIONS
•   Very positive feedback
•   Iterative tuning based on actual user input from log files
     •   Regression test
•   Lucene is fast
     •   Entire type-ahead still within 40ms
•   But: partner currently evaluating naive-only approach
     •   sometimes good enough is good enough
•   Field Inspection will allow high quality selection
     •   With fallback to naive syntactic search
THANK YOU




@anneveling

Smart Autocompl... with Solr

  • 1.
    you complete me AnneVeling – June 5th, 2012 – Berlin Buzzwords @anneveling
  • 2.
    AGENDA • 9292.nl Public Transport Site • Naive Address Autocompletion • Field Inspection Semantic Autocompletion • Conclusions
  • 4.
    9292.NL • Largest public transport site of The Netherlands • 1M travel advices per day! • Complete new site by Q42 • Linking to existing routing engine • Moving from multiple input boxes to one • Mobile applications for Windows, iPhone, Android
  • 9.
    DATA • 10M points • Train and metro stations • Bus stops • Places of Interest • Streets • Street ranges • Addresses • Highly ambiguous • Streets / city names / POI • Spelling mistakes • No single order
  • 10.
    NAIVE IMPLEMENTATION • One concatenated field in Lucene • Tune tokenizer/analyzer • Tune query analyzer • Tune weights • Syntax Only
  • 11.
    100% 80% quality effort
  • 13.
    FIELD INSPECTION • Taking advantage of • Number of fields • Speed of Lucene • Query Analysis • For each term, query in all fields • Does it appear in that field? Count > 0? • Use that information to do semantic interpretation
  • 14.
    etten leur zeil city? station? bus stop? street? city:etten-leur street:zeil
  • 15.
    RESULTS • Implemented in Scala • Lucene RequestHandler in Solr • Ajax front-end
  • 17.
    TUNING • Iterative Tuning • Using real user inputs from production log files • Regression Testing to track index/algorithm changes over time • For how many test queries is the expected result • The top result? • In the top 5?
  • 18.
    CONCLUSIONS • Very positive feedback • Iterative tuning based on actual user input from log files • Regression test • Lucene is fast • Entire type-ahead still within 40ms • But: partner currently evaluating naive-only approach • sometimes good enough is good enough • Field Inspection will allow high quality selection • With fallback to naive syntactic search
  • 19.