Successfully reported this slideshow.

Just the Job: Employing Solr for Recruitment Search -Charlie Hull

1,178 views

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011

Using a case study on a major European executive recruitment company, we will show how we used Apache Lucene/Solr to build powerful, flexible, accurate and scalable search services over tens of millions of CVs and candidate records, allowing the company to completely restructure their IT provision for both local and national offices.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Just the Job: Employing Solr for Recruitment Search -Charlie Hull

  1. 1. Just the Job – Employing Apache Solr for Recruitment Search Charlie Hull, Flaxcharlie@flax.co.uk @FlaxSearch 19th October 2011
  2. 2. What I Will Cover Who are Flax? 2
  3. 3. What I Will Cover Who are Flax? The Project & The Solution 3
  4. 4. What I Will Cover Who are Flax? The Project & The Solution How we did it • A flexible pipeline in two parts • Transforming the UI • Performance • Issues • Results & benefits 4
  5. 5. What I Will Cover Who are Flax? The Project & The Solution How we did it • A flexible pipeline in two parts • Transforming the UI • Performance • Issues • Results & benefits Conclusions & Lessons Learned • Learning to love open source search 5
  6. 6. Who are Flax? Search engine specialists with decades of experience Based in Cambridge, U.K. Customers include Financial Times, Durrants Ltd., Accenture, University of Cambridge UK Authorised Partner of Lucid ImaginationWe also run a Search Meetup:Start your own - add to www.searchmeetups.com !
  7. 7. The Project The client: Reed Specialist Recruitment 7
  8. 8. The Project The client: Reed Specialist Recruitment The data • Hundreds of millions of items to search • Hundreds of fields in the database schema (which will change in the future) • CVs (resumés) in Word, PDF formats • Multiple languages 8
  9. 9. The Project The client: Reed Specialist Recruitment The data • Hundreds of millions of items to search • Hundreds of fields in the database schema (which will change in the future) • CVs (resumés) in Word, PDF formats • Multiple languages The problem • Search takes several minutes • 3000+ users familiar with the old system • No foundation for innovation 9
  10. 10. The Solution – Apache Solr Flexible and extendable • This is only the first wave of development • A need for complex business rules to drive the search – Boosts & FunctionQueries 10
  11. 11. The Solution – Apache Solr Flexible and extendable • This is only the first wave of development • A need for complex business rules to drive the search – Boosts & FunctionQueries Economically scalable • Much more data to come • Too hard to predict future cost of commercial, closed source alternatives 11
  12. 12. The Solution – Apache Solr Flexible and extendable • This is only the first wave of development • A need for complex business rules to drive the search – Boosts & FunctionQueries Economically scalable • Much more data to come • Too hard to predict future cost of commercial, closed source alternatives Great support available - from and 12
  13. 13. A flexible pipeline - in two parts
  14. 14. A flexible pipeline - in two parts1. Indexer • Reads an XML settings file • Extracts data from Oracle • Processes if necessary • Adds to a Solr index
  15. 15. A flexible pipeline - in two parts1. Indexer • Reads an XML settings file • Extracts data from Oracle • Processes if necessary • Adds to a Solr index2. Config tool • Creates a Solr schema from the Indexer settings • Verifies types and checks for conflicts
  16. 16. The Indexer CV Actions Processes Solr IndexOracle DB xml
  17. 17. The Indexer CV Solr IndexOracle CopyAction DB xml
  18. 18. The Indexer CVAction CV CVTikaSource CVSolrSource Solr IndexOracle DB xml
  19. 19. The Indexer CV MostRecent Solr Index DateProcessOracle DB xml
  20. 20. The Indexer CV Actions Processes Solr IndexOracle DB xml
  21. 21. The Indexer & The Config Tool CV Solr schema Actions Processes Solr Index .xmlOracle DB xml Verify & Generate
  22. 22. The pipeline in code...Actions<action ref="copyAction" column="EMAIL" field="email" />Processes<process-map> <process field="boost_date"> <beans:bean class="...MostRecentDateProcess"> ... <beans:value>updateddate</beans:value> <beans:value>createddate</beans:value> ... </process></process-map> 22
  23. 23. The pipeline in code...Actions<action ref="copyAction" column="EMAIL" field="email"type="string" indexed="true" stored="true"/>Processes<process-map> <process field="boost_date" type="tdate"indexed="true" stored="false"> <beans:bean class="...MostRecentDateProcess"> ... <beans:value>updateddate</beans:value> <beans:value>createddate</beans:value> ... </process></process-map> 23
  24. 24. ...and a Solr schema<?xml version="1.0" encoding="UTF-8" ?> <schema> <fields> <field name="email" type="string" indexed="true"stored="true" /> <field name="boost_date" type="tdate" indexed="true"stored="false"/> </fields> </schema> 24
  25. 25. Transforming the UI
  26. 26. Transforming the UI
  27. 27. Transforming the UI
  28. 28. Transforming the UI
  29. 29. Transforming the UI
  30. 30. Transforming the UI
  31. 31. Performance Many factors can affect search performance... 31
  32. 32. Performance Many factors can affect search performance... ...so we built a test framework • Randomly generated queries based on terms in the index • Average query times & number of results recorded • Allows for direct comparison of boost functions, for example 32
  33. 33. Performance...much improved! Sub-second searches Only a single server required So fast that the thin client hardware had to upgraded as it became a bottleneck! Still work to be done on improving indexing speed 33
  34. 34. Issues Users dont always understand their new freedoms • Training can be required on free text search, faceting... • Any issues reduce user confidence in new systems 34
  35. 35. Issues Users dont always understand their new freedoms • Training can be required on free text search, faceting... • Any issues reduce user confidence in new systems Solr features can conflict with each other • Make sure you understand how features interact – i.e. recency over relevance, synonyms, stopwords • Get the basics working first 35
  36. 36. Results & benefits Project delivered on time and under budget Now live across 350 offices UK & worldwide 24/7/365 support provided by Lucid Imagination 36
  37. 37. Results & benefits Project delivered on time and under budget Now live across 350 offices UK & worldwide 24/7/365 support provided by Lucid Imagination A very happy client! 37
  38. 38. Conclusions & Lessons Learned What we learned • A flexible pipeline is essential • Get the basics working first - watch out for feature conflict 38
  39. 39. Conclusions & Lessons Learned What we learned • A flexible pipeline is essential • Get the basics working first - watch out for feature conflict What Reed learned • User training is important - even if the new system is “simpler” • To love Open Source Search... 39
  40. 40. Conclusions & Lessons Learned"The transition to Solr was the latest step inour strategy to develop a truly worldclasssearch application. We believe it provides arobust architecture that meets our futureaims, it will scale economically and is awelcome addition to our existing suite ofOpen Source systems." 40
  41. 41. The End Thanks for listening! For more information please contact me: Charlie Hull, Managing Director, Flax charlie@flax.co.uk http://www.flax.co.uk/blog @FlaxSearch 41

×