Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Full-text search in Couchbase 5.0: how it works and what it can do – Couchbase Connect New York 2017

378 views

Published on

Due to the lack of built-in full-text search, many NoSQL database customers export data from their operational databases into specialized text search engines that are installed and managed separately. This approach is complex to manage, costly to operate, and makes application queries across both repositories very difficult and error prone. The Couchbase Search service puts full-text search back where it belongs – inside your operational database.
In this session by Couchbase Search experts, you’ll learn what you can do in Couchbase with full-text search, how to map your data buckets into useful, performant full-text indexes, the kinds of searches you can perform, and the implications of a distributed, sharded full-text index. We’ll also discuss the Couchbase Search product roadmap and the features that you can expect to be added to this service in upcoming releases.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Full-text search in Couchbase 5.0: how it works and what it can do – Couchbase Connect New York 2017

  1. 1. ©2017 Couchbase Inc. FULL-TEXT SEARCH IN COUCHBASE 5.0 HOW ITWORKS ANDWHAT IT CAN DO 1
  2. 2. ©2017 Couchbase Inc. 2 Marty Schoch Senior Software Engineer for Search marty@couchbase.com @mschoch IMAGE GOES HERE
  3. 3. ©2017 Couchbase Inc. Title of Slide Goes Here •Why? •What is it? •How does it work? •Design •Demo •Best Practices •Status / Roadmap /What’s Next 3
  4. 4. ©2017 Couchbase Inc.©2017 Couchbase Inc. 4 Why
  5. 5. ©2017 Couchbase Inc. Couchbase Users Need to Search their Documents 5
  6. 6. ©2017 Couchbase Inc. Dedicated Search Solutions 6 ✗ Provision ✗ Install ✗ Integrate ✗ Transfer data ✗ Learn ✗ Manage ✗ Troubleshoot ≠
  7. 7. ©2017 Couchbase Inc. Why FullText Search? 7 simple 80/20 of features integrated
  8. 8. ©2017 Couchbase Inc.©2017 Couchbase Inc. 8 What is it?
  9. 9. ©2017 Couchbase Inc. What is Full-Text Search? 9
  10. 10. ©2017 Couchbase Inc. What is Full-Text Search? 10 Result Text Snippets Search Term Highlighting
  11. 11. ©2017 Couchbase Inc.©2017 Couchbase Inc. 11 How does it work?
  12. 12. ©2017 Couchbase Inc. How does it work? •Inverted Indexes •Language Awareness •Relevance Scoring 12
  13. 13. ©2017 Couchbase Inc. Inverted Index 13 Term in Document my: Doc 1, Doc 2, Doc 3 luxuri: hotel_1243, Doc 2, Doc 81 has: Doc 1, Doc 2, Doc 3 small: hote_7399, Doc 81 … Document ID Posting List
  14. 14. ©2017 Couchbase Inc. Inverted Index 14 Indexed Terms Document ID Postings List cozi hotel_1289, hotel_3376, hotel_5022, hotel_9994 luxuri hotel_0092, hotel_1289, hotel_8989 small hotel_3376 spacious hotel_0092, hotel_1289, hotel_3376, hotel_5022, hotel_8989, hotel_9994
  15. 15. ©2017 Couchbase Inc. Language Aware 15 Document contains… Beauty Indexed as… beauti stemmingstemming Text Analysis ✔ Match! User searches… Beautiful Searched as… beauti
  16. 16. ©2017 Couchbase Inc. Revelvance Scoring 16
  17. 17. ©2017 Couchbase Inc. TF/IDF Scoring 17 • TF = Term Frequency • How often does a term occur in a document? • More often yields a higher score • IDF = Inverse Document Frequency • How many documents have this term? • More documents yields lower score • (because it means the term is more common)
  18. 18. ©2017 Couchbase Inc. Index Mapping 18
  19. 19. ©2017 Couchbase Inc. Index Mapping 19 • Exclude fields/sub-sections • Configure indexing behavior by type of document (beer vs brewery) • Configure indexing behavior per-field • Index Fields • Nested structures • Arrays
  20. 20. ©2017 Couchbase Inc. Precision vs. Recall 20 • Precision – ratio of document matches that are actually relevant • Recall – ratio of relevant documents that are actually matched • High quality results depend on performing the right analysis for your text • Beware: increasing precision may reduce recall (and vice versa)
  21. 21. ©2017 Couchbase Inc.©2017 Couchbase Inc. 21 Design
  22. 22. ©2017 Couchbase Inc. FTS Design / Index Partitioning 22 bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) FTS nodes: X Y Z index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023 assign to FTS nodes: replicas, too:
  23. 23. ©2017 Couchbase Inc. FTS Design / Indexing 23 couchbase couchbase couchbase FTS FTS FTS DCP streams for incremental index updates
  24. 24. ©2017 Couchbase Inc. FTS Design / Quering 24 a query sent to any FTS node… your application RESTFTS FTS FTS …is scatter / gathered to the other FTSnodes
  25. 25. ©2017 Couchbase Inc. FTS Designed for Scalability and Availability 25 ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes (auto-placement) ✔rebalance (add/swap/remove) ✔scatter/gather queries (partial results ok) ✔replicas (only primaries queried) ✔failover (replicas promoted)
  26. 26. ©2017 Couchbase Inc.©2017 Couchbase Inc. 26 Live Demo
  27. 27. ©2017 Couchbase Inc.©2017 Couchbase Inc. 27 Best Practices
  28. 28. ©2017 Couchbase Inc. Use Explicit Mappings In Production 28 { “type” : ”brewery”, “random_number” : 4, “edible” : false } Dynamic mappings are great, until… Developer adds one small field “comments”: 4k of text “random_number” : 4, “edible” : false }
  29. 29. ©2017 Couchbase Inc. Always Use Index Aliases 29
  30. 30. ©2017 Couchbase Inc. Always Use Index Aliases 30 /users /usersV1 /usersV2 Indexing 55% Atomic Switch to /usersV2
  31. 31. ©2017 Couchbase Inc.©2017 Couchbase Inc. 31 Status / Roadmap / What’s Next
  32. 32. ©2017 Couchbase Inc. Project Status 32 FTS will be GA in 5.0 Please try the beta http://www.couchbase.com/download
  33. 33. ©2017 Couchbase Inc.©2017 Couchbase Inc. ThankYou! 33

×