Search, APIs, capability management and the Sensis journey - By Rees Craig

1,240 views
1,107 views

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011

Earlier this year, Sensis launched its Business Search API, which allows publishers to develop local
search propositions powered by the two million business listings contained in the Australian Yellow
Pages® and White Pages® directories.
This case study will explore Sensis’ strategic direction for search and explain how the framework and
metrics by which search is managed at Sensis were used to define our search roadmap. Key
architectural decisions including our use of Solr and MongoDB will be discussed as well as our
approach to real-time search tuning and quality management.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,240
On SlideShare
0
From Embeds
0
Number of Embeds
344
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Search, APIs, capability management and the Sensis journey - By Rees Craig

  1. 1. Search, APIs, Capability Management and the Sensis Journey Craig Rees
  2. 2. <ul><li>Project background </li></ul><ul><li>Platform selection </li></ul><ul><li>Search capability </li></ul><ul><li>Relevance </li></ul><ul><li>Architecture </li></ul><ul><li>Quality management </li></ul><ul><li>Hurdles </li></ul><ul><li>What’s next </li></ul>Today’s menu
  3. 3. <ul><li>Sensis helps Australians find, buy and sell </li></ul><ul><li>From print directories to a cross-platform lead generator </li></ul><ul><li>Sensis publishes over 1.8 Million business listings </li></ul><ul><li>Two of the top 10 visited online sites in Australia (WhitePages.com.au and YellowPages.com.au) </li></ul>Sensis
  4. 4. <ul><li>Business objectives </li></ul><ul><li>Drive presence in the local search market place </li></ul><ul><li>Open up the largest database of business listings in Australia </li></ul><ul><li>Reduce the effort required from local search developers </li></ul><ul><li>Free to use, we are after the reporting </li></ul><ul><li>Technology objectives </li></ul><ul><li>Develop a total search platform </li></ul><ul><li>Relevancy testing as part of the development lifecycle </li></ul><ul><li>A framework to identify problem spaces </li></ul><ul><li>Manageable platform </li></ul><ul><li>Continuous deployments </li></ul>Project background
  5. 5. Developer portal
  6. 6. <ul><li>Support for the search capability team </li></ul><ul><li>Structured vs non structured data </li></ul><ul><li>Deterministic vs black box </li></ul><ul><li>Non propriety code base </li></ul><ul><li>Community backing </li></ul>Platform selection
  7. 7. Unmanaged Adhoc Monitored Managed Optimized <ul><li>No resources </li></ul><ul><li>No reporting </li></ul><ul><li>Out of the box features </li></ul><ul><li>Adhoc processes </li></ul><ul><li>Part time team </li></ul><ul><li>Static dictionaries </li></ul><ul><li>Individual led innovation </li></ul><ul><li>Defined team </li></ul><ul><li>Regular monitoring </li></ul><ul><li>Static autosuggest </li></ul><ul><li>Basic linguistics </li></ul><ul><li>Online dashboards </li></ul><ul><li>Test environments </li></ul><ul><li>Dynamic search refinements </li></ul><ul><li>Targets and metrics </li></ul><ul><li>A/B testing </li></ul><ul><li>Machine learning </li></ul><ul><li>External collaboration </li></ul><ul><li>Multiple contexts </li></ul>The Sensis Search capability maturity model *Courtesy of Pete Crawford & Craig Lonsdale Lvl 5 Lvl 4 Lvl 3 Lvl 2 Lvl 1
  8. 8. Context is key <ul><li>Intent </li></ul><ul><li>Name </li></ul><ul><li>Type </li></ul><ul><li>Product </li></ul><ul><li>Spatial </li></ul>Location Chronology Social Graph Individual Device
  9. 9. Historical search Data MongoDB Business Data Geo Service Index Name Query Handler Type Query Handler Business Data Search Service Reporting Service Reporting Events Publisher Solr API Ontologies Mashery Our architecture
  10. 10. Historical search Data MongoDB Business Data Geo Service Index Name Query Handler Type Query Handler Business Data Search Service Reporting Service Reporting Events Publisher Solr API Ontologies Mashery Data staging
  11. 11. Historical search Data MongoDB Business Data Geo Service Index Name Query Handler Type Query Handler Business Data Search Service Reporting Service Reporting Events Publisher Solr API Ontologies Mashery Search
  12. 12. Historical search Data MongoDB Business Data Geo Service Index Name Query Handler Type Query Handler Business Data Search Service Reporting Service Reporting Events Publisher Solr API Ontologies Mashery API
  13. 13. Historical search Data MongoDB Business Data Geo Service Index Name Query Handler Type Query Handler Business Data Search Service Reporting Service Reporting Events Publisher Solr API Ontologies Mashery API proxy
  14. 14. <ul><li>Moved from a black box solution to a manageable platform </li></ul><ul><li>Deliver search improvements without major code changes </li></ul><ul><li>Understand how results were calculated </li></ul><ul><li>Identity problems scientifically </li></ul><ul><li>Continuously tune and test relevance </li></ul>Evolution of search management Yesterday Today Tomorrow
  15. 15. Problem spaces, quality management & tuning Path Analysis used to identify problems spaces Problem spaces, quality management & tuning “ Gold Sets” used to define overall quality score (TREC) Features signed off only when they make a positive impact to quality score <ul><li>Specific gold sets for each problem space: </li></ul><ul><ul><li>Intent </li></ul></ul><ul><ul><li>Spelling & stemming </li></ul></ul><ul><ul><li>Location </li></ul></ul><ul><ul><li>Phrase parsing </li></ul></ul>
  16. 16. Search quality analysis and testing
  17. 17. Results examiner
  18. 18. Score analysis
  19. 19. Tuning
  20. 20. Lather, rinse, repeat
  21. 21. <ul><li>Data redundancy and homogeneity </li></ul><ul><ul><li>Solr ranking of rare terms </li></ul></ul><ul><ul><li>Intent differentiation </li></ul></ul><ul><ul><li>Contextual synonyms </li></ul></ul>Hurdles along the way
  22. 22. <ul><li>Query engine </li></ul><ul><ul><li>Facets / autosuggest </li></ul></ul><ul><ul><li>Real time tuning </li></ul></ul><ul><ul><li>Machine learning </li></ul></ul><ul><ul><li>Multi term queries </li></ul></ul><ul><ul><li>Scoring thresholds </li></ul></ul><ul><ul><li>Content Value </li></ul></ul>Where next?
  23. 23. Questions? Email: craig.rees@sensis.com.au www: developers.sensis.com.au Twitter: @SensisAPI @ablebagel

×