Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Challenges for Industrial-strength Information Retrieval on Databases

161 views

Published on

Presentation for the KARS 2017 paper on Spinque technology:
Implementing keyword search and other IR tasks on top of relational engines has become viable in practice, especially thanks to high-performance column-store technology. Supporting complex combinations of structured and unstructured search in real-world heterogeneous data spaces however requires more than “just” IR-on-DB. In this work, we walk the reader through our industrial-strength solution to this challenge and its application to a real-world scenario.
By treating structured and unstructured search as first-class citizens of the same computational platform, much of the integration effort is pushed from the application level down to the data-management level. Combined with a visual design environment, this allows to model complex search engines without a need for programming.

Published in: Internet
  • Be the first to comment

Challenges for Industrial-strength Information Retrieval on Databases

  1. 1. Challenges for industrial-strength Information Retrieval on Databases R. Cornacchia, M. Hildebrand, A.P. de Vries, F. Dorssers KARS2017 - 21 March 2017, Venice, IT
  2. 2. ○ Since 2010 ○ Spin-off of CWI, Amsterdam ○ “Search by Strategy” About Spinque
  3. 3. Outline 1. Search is everywhere 2. Tailored search is expected 3. Tailored search needs modelling 4. Search modelling by information specialists 5. Search modelling needs flexible IR & DB 6. IR on DB: it works
  4. 4. Search is everywhere Real world scenarios Technical Desktop Coding content assistant Product recommendation Personalised newsfeed
  5. 5. Let’s pick a simple one: autocompletion iphone 7 iphone 5c iphone 6s ipho| “autocompletion is trivial” .. not so fast! Tailored search is expected
  6. 6. autocompletion iphone 7 iphone 5c iphone 6s ipho| Basic - products ○ Any matching term from the index ○ Suggest products Tailored search is expected
  7. 7. autocompletion iphone 7 iphone 5c iphone 6 cases ipho| Basic - products & categories ○ Any matching term from the index ○ Suggest products & categories Tailored search is expected
  8. 8. autocompletion iphone 7 iphone 6 cases iphone 6s ipho| Filtered ○ Any matching term from the index ○ “iPhone 5c” out of stock Tailored search is expected
  9. 9. autocompletion iphone 8 iphone 7 iphone 6 cases ipho| Filtered & ranked ○ “iPhone 5c” out of stock ○ “iPhone 8” the most requested Tailored search is expected
  10. 10. autocompletion iphone cases iphone adapters iphone 7 ipho| Exploratory ○ First suggest categories.. ○ .. then products Tailored search is expected
  11. 11. autocompletion iphone 7 cases iphone 7 adapters iphone 8 ipho| Personalised ○ I already own an “iPhone 7” ○ Suggest compatible accessories ○ Suggest upgrade Tailored search is expected
  12. 12. What if my search API isn’t enough? Tailored search needs modelling iphone 7 cases iphone 7 adapters iphone 8 ipho| <your favourite autocompletion> ○ Out-of-the-box API may fall short ○ Build custom search API ○ Who? How? http://localhost:8983/solr/suggest?q=ipho
  13. 13. How do we build custom search APIs? Search modelling by information specialists data modelling search modelling Spinque: Empower the information specialist
  14. 14. Empowering the information specialist data modelling search modelling Search modelling by information specialists
  15. 15. Data modelling Search modelling needs flexible IR & DB business transactions social media
  16. 16. Search modelling standard autocompletion custom autocompletion Search modelling by information specialists http://spinque/suggest?q=ipho http://spinque/suggest_ranked?q=ipho
  17. 17. The IR & DB challenge Search modelling needs flexible IR & DB ○ IR & DB both needed even for trivial tasks ○ Different technologies / focus ○ How / where to integrate task results? ○ Do they stay black boxes? ○ Can we express them in the same platform, and when does this make sense? http://spinque/suggest_ranked?q=ipho
  18. 18. Text retrieval by strategy Search modelling needs flexible IR & DB text retrieval.. ..is just another DB query ○ strategy-driven “collection” and “documents” ○ on-demand indexing ○ it takes just standard SQL
  19. 19. Graph DB by strategy Search modelling needs flexible IR & DB Visual modelling Relational Algebra Graph subject property object 123 name pen 123 availability in stock 123 price 9.99
  20. 20. Graph DB by strategy Search modelling needs flexible IR & DB we want DB & ranking together & seamlessly what if this.. ..could work on this? subject property object p 123 name pen 1.0 123 availability in stock 0.8 123 price 9.99 1.0
  21. 21. Rank. Everything. Always. Search modelling needs flexible IR & DB rank products.. ..get ranked orders and customers Fuhr, Rölleke, 1997, A probabilistic relational algebra for the integration of IR and DB SELECT g.obj, (o.p * g.p) as p FROM graph g, ranked_orders o WHERE g.subj = o.id AND g.rel = ’orderedBy’; PROJECT [$3] JOIN INDEPENDENT [$1=$1] SELECT [$2=’orderedBy’] (g) ranked_orders SQLPRA
  22. 22. What about efficiency? IR on DB: it works 1.1M docs, 2.3GB 4-core i7-3770s, 16GB RAM, 256GB SSD find documents: 20ms 8M lots, 25K auctions (10GB raw data) VM (8 CPUs) on Xeon E5-2620, 16GB RAM, 256GB SSD find lots: 150ms topic
  23. 23. What about efficiency? IR on DB: it works pre-compute what can be pre-computed.. ..but do it query-driven ○ Index on demand ○ Cache result of relational expressions ○ Algebraic analysis to determine cache
  24. 24. What about efficiency? IR on DB: it works choose it carefully.. ..then enjoy ○ Main benefits of IR on DB ○ IR as a DB optimisation problem ○ No custom extensions, no vendor-lock ○ Column-store, CPU-friendly DB engine Hey, we made our join 20% faster. You are welcome.
  25. 25. ○ If you just text retrieval on documents ○ Lucene-like will serve you well ○ Information needs tend to be more complex ○ Solve at application-level: common and painful ○ A one-platform approach pays off IR on DB: when does it make sense? IR on DB: it works
  26. 26. Conclusions 1. Search is everywhere ○ In the real world.. 2. Tailored search is expected ○ ..there is no search like another. 3. Tailored search needs modelling ○ Someone will put effort in it.. 4. Search modelling by information specialists ○ ..who better than the right person for the job? 5. Search modelling needs flexible IR & DB ○ Who takes care of the low-level details then? 6. IR on DB: it works ○ The right tools. The right architecture.
  27. 27. ○ Live updates ○ ACID transactions overhead ○ Scale out ○ It’s more than “just an inverted file” to be distributed ○ Even better support for information specialists ○ Strategy auto-tuning Challenges ahead
  28. 28. supporting information specialists Don’t program search engines, design them

×