Search, APIs, capability management and Sensis's journey

1,552 views

Published on

My talk at Lucene Revolution 2011 about how Sensis is using Solr to deliver its API strategy and some of the drivers that define how search is managed at Sensis

Published in: Technology, Travel
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,552
On SlideShare
0
From Embeds
0
Number of Embeds
52
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Search, APIs, capability management and Sensis's journey

  1. 1. Search, APIs,Capability Management and the Sensis Journey Craig Rees
  2. 2. • Project background• Platform selection• Search capability• Relevance• Architecture• Quality management• Hurdles• What’s next Today’s menu
  3. 3. • Sensis helps Australians find, buy and sell • From print directories to a cross-platform lead generator • Sensis publishes over 1.8 Million business listings • Two of the top 10 visited online sites in Australia (WhitePages.com.au and YellowPages.com.au)Sensis
  4. 4. Business objectives• Drive presence in the local search market place• Open up the largest database of business listings in Australia• Reduce the effort required from local search developers Technology objectives• Free to use, we are after the • Develop a total search platform reporting • Relevancy testing as part of the development lifecycle • A framework to identify problem spaces • Manageable platform • Continuous deploymentsProject background
  5. 5. Developer portal
  6. 6. • Support for the search capability team• Structured vs non structured data• Deterministic vs black box• Non propriety code base• Community backing Platform selection
  7. 7. • A/B testing • Machine learningOptimized Lvl 5 • External collaboration • Multiple contexts • Online dashboards • Test environmentsManaged Lvl 4 • Dynamic search refinements • Targets and metrics • Defined team • Regular monitoringMonitored Lvl 3 • Static autosuggest • Basic linguistics • Adhoc processes • Part time teamAdhoc Lvl 2 • Static dictionaries • Individual led innovation • No resources • No reportingUnmanaged Lvl 1 • Out of the box featuresThe Sensis Search capability maturity model*Courtesy of Pete Crawford & Craig Lonsdale
  8. 8. Location Intent Chronology • Name • Type Social Graph • Product • Spatial Device IndividualContext is key
  9. 9. Business Geo Service Data Solr Mashery Business Name Query Data Search MongoDB Handler Service Index API Publisher Reporting Type Query Service Handler Historical search Data Reporting Events OntologiesOur architecture
  10. 10. Business Geo Service Data Solr Mashery Business Name Query Data Search MongoDB Handler Service Index API Publisher Reporting Type Query Service Handler Historical search Data Reporting Events OntologiesData staging
  11. 11. Business Geo Service Data Solr Mashery Business Name Query Data Search MongoDB Handler Service Index API Publisher Reporting Type Query Service Handler Historical search Data Reporting Events OntologiesSearch
  12. 12. Business Geo Service Data Solr Mashery Business Name Query Data Search MongoDB Handler Service Index API Publisher Reporting Type Query Service Handler Historical search Data Reporting Events OntologiesAPI
  13. 13. Business Geo Service Data Solr Mashery Business Name Query Data Search MongoDB Handler Service Index API Publisher Reporting Type Query Service Handler Historical search Data Reporting Events OntologiesAPI proxy
  14. 14. • Moved from a black box Yesterday Today Tomorrow solution to a manageable platform• Deliver search improvements without major code changes• Understand how results were calculated• Identity problems scientifically• Continuously tune and test relevance Evolution of search management
  15. 15. Specific gold sets for each Path Analysis problem space: used to identify  Intent  Spelling & stemming problems  Location spaces  Phrase parsing Features signed off “Gold Sets” only when they make used to define a positive impact to overall quality quality score score (TREC)Problem spaces, quality management & tuning
  16. 16. Search quality analysis and testing
  17. 17. Results examiner
  18. 18. Score analysis
  19. 19. Tuning
  20. 20. Lather, rinse, repeat
  21. 21. • Data redundancy and homogeneity • Solr ranking of rare terms • Intent differentiation • Contextual synonymsHurdles along the way
  22. 22. • Query engine • Facets / autosuggest • Real time tuning • Machine learning • Multi term queries • Scoring thresholds • Content ValueWhere next?
  23. 23. Email: craig.rees@sensis.com.au www: developers.sensis.com.au Twitter: @SensisAPI @ablebagelQuestions?

×