Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Evolving Search Relevancy: Presented by James Strassburg, Direct Supply

822 views

Published on

Presented at Lucene/Solr Revolution 2014

Published in: Software
  • Be the first to comment

Evolving Search Relevancy: Presented by James Strassburg, Direct Supply

  1. 1. Evolving Search Relevancy James Strassburg Senior Architect -­‐ Direct Supply @jstrassburg
  2. 2. Agenda • An Optimization Problem • Genetic Algorithm Overview • Modeling Solr Parameters • Fitness Function
  3. 3. sir can you help me… ???? "iam from indonesia want to build search engine like a Google and i want to build the system using Genetic Algorithm but iam confused what will i do first. Thanks before."
  4. 4. Search Algorithm Parameters /select?q=foo&defType=dismax &qf=name^20+desc^10 &pf=name^10&ps=3&mm=2 &bf=”ord(popularity)^0.05” and many more
  5. 5. Where did those numbers come from? I made them up… shhhhhhh. Then we tweaked them after testing.
  6. 6. An Optimization Problem So, how do we know we have the best set of numbers? Or even a good set? We have an optimization problem.
  7. 7. Sample Schema <field name="name" type="text_en" indexed="true" stored="true" required="true" multiValued="false" omitNorms="true"/> <field name="description" type="text_en" indexed="true" stored="true" multiValued="false" omitNorms="true"/>
  8. 8. Sample Data Set [{ "name":"Red Lobster", "description":"We deliver the freshest caught seafood every day." },{ "name":"Joe's Crab Shack", "description":"We serve delicious red crabs, rock crabs, large lobsters, and other delicious seafood. Our lobsters are our specialty."}] http://localhost:8983/solr/restaurantsCollection/select?q=red+lobster&defType=dismax&qf=name +description&indent=true&fl=name+description
  9. 9. Genetic Algorithms • A tool for solving optimization problems • Based on ideas from genetics, evolution, and natural selection • DEAP – Distributed Evolutionary Algorithms in Python
  10. 10. Genetic Algorithms • Define candidate solution encoding • Define a fitness function • Generate random solutions • Select candidates for reproduction • Use crossover and mutation to create a new generation • Repeat until some criteria is met
  11. 11. Crossover and Mutation Parent 1: [1,0,1,1,1,0,1,1] Parent 2: [0,0,0,0,1,1,1,1] Child: [1,0,0,1,1,0,1,0]
  12. 12. Encoding Parameters >>> sys.float_info sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)
  13. 13. Encoding Parameters >>> import numpy >>> single = numpy.float32(3.4) >>> single 3.4000001 >>> half_single = numpy.float16(3.4) >>> half_single 3.4004
  14. 14. Encoding Parameters /select?q=foo&qf=field^35.2 versus /select?q=foo&qf=field^35.3
  15. 15. Decimal / Fibonacci Encoding • 0, 0.2, 0.4, 0.8, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144 • 16 values encode into 4-bits • Supports fast evolution • Avoids relative maxima
  16. 16. Decimal / Fibonacci Encoding 0.0 => [0, 0, 0, 0] 0.2 => [0, 0, 0, 1] 0.4 => [0, 0, 1, 0] … 1 => [0, 1, 0, 1] 2 => [0, 1, 1, 0] … 144 => [1, 1, 1, 1]
  17. 17. Candidate Solution Encoding /select?q=foo&qf=name^0.4+desc^13 0.4 => [0, 0, 1, 0] 13 => [1, 0, 1, 0] Candidate Solution: [0, 0, 1, 0, 1, 0, 1, 0]
  18. 18. Fitness Function • Measure how well a candidate solution solves the problem • Should be very fast
  19. 19. Normalized Discounted Cumulative Gain • Very relevant > relevant > not relevant • Relevant results are more useful if they appear earlier • Results should be irrelevant of the query
  20. 20. Precision and Recall Precision – Likelihood that a returned result was correct Recall – Likelihood that a relevant result was returned
  21. 21. F-measure • Harmonic mean of precision and recall • Punishes outliers
  22. 22. Analytics in Schema <field name="searchTermInteractions" type="lowercase" indexed="true" stored="true" multiValued="true"/>
  23. 23. Demo
  24. 24. Resources • DEAP - https://code.google.com/p/deap/ • My github repo for this example - https://github.com/jstrassburg/evolving-search-relevancy • @jstrassburg

×