Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Full-text search: how it works and what it can do – Couchbase Connect 2016

491 views

Published on

Full-text search is a developer preview feature for Couchbase Server. In this presentation by Couchbase Server engineers, you’ll learn what you can do with full-text search, how to map your buckets into useful, performant full-text indexes, the kinds of searches you can perform, and implications of a distributed, sharded full-text index.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Full-text search: how it works and what it can do – Couchbase Connect 2016

  1. 1. ©2016 Couchbase Inc. Couchbase FullText Search (FTS)
  2. 2. ©2016 Couchbase Inc.©2016 Couchbase Inc. about your speakers 2 Marty Schoch Steve Yen
  3. 3. ©2016 Couchbase Inc.©2016 Couchbase Inc. agenda why? what is it? how does it work? how does it scale? demo best practices status / roadmap / what’s next
  4. 4. ©2016 Couchbase Inc.©2016 Couchbase Inc. agenda why?
  5. 5. ©2016 Couchbase Inc.©2016 Couchbase Inc. couchbase users need to search their documents
  6. 6. ©2016 Couchbase Inc.©2016 Couchbase Inc. dedicated search solutions ✗ Provision ✗ Install ✗ Integrate ✗ Transfer data ✗ Learn ✗ Manage ✗ Troubleshoot ≠
  7. 7. ©2016 Couchbase Inc.©2016 Couchbase Inc. why FullText Search? why FullText Search? simple integrated 80/20 of features
  8. 8. ©2016 Couchbase Inc.©2016 Couchbase Inc. agenda what is it?
  9. 9. ©2016 Couchbase Inc.©2016 Couchbase Inc. what’s FullText Search?
  10. 10. ©2016 Couchbase Inc.©2016 Couchbase Inc. what’s FullText Search?
  11. 11. ©2016 Couchbase Inc.©2016 Couchbase Inc. search results Result Text Snippets
  12. 12. ©2016 Couchbase Inc.©2016 Couchbase Inc. search results Result Text Snippets Highlighted Search Terms
  13. 13. ©2016 Couchbase Inc.©2016 Couchbase Inc. agenda how does it work?
  14. 14. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it work? • Inverted indexes • Language awareness • Scoring
  15. 15. ©2016 Couchbase Inc.©2016 Couchbase Inc. inverted index Terms my: Doc 1, Doc 2, Doc 3 dog: Doc 1, Doc 2, Doc 81 has: Doc 1, Doc 2, Doc 3 fleas: Doc 1, Doc 81 … Where found
  16. 16. ©2016 Couchbase Inc.©2016 Couchbase Inc. language aware Document contains… Beauty Indexed as… beauti stemmingstemming Text Analysis ✔ Match! User searches… Beautiful Searched as… beauti
  17. 17. ©2016 Couchbase Inc.©2016 Couchbase Inc. scoring
  18. 18. ©2016 Couchbase Inc.©2016 Couchbase Inc. TF/IDF scoring • TF = Term Frequency • How often does a term occur in a document? • More often yields a higher score • IDF = Inverse Document Frequency • How many documents have this term? • More documents yields lower score • (because it means the term is more common)
  19. 19. ©2016 Couchbase Inc.©2016 Couchbase Inc. index mapping
  20. 20. ©2016 Couchbase Inc.©2016 Couchbase Inc. index mapping • Exclude fields/sub-sections • Configure indexing behavior by type of document (beer vs brewery) • Configure indexing behavior per-field • Index Fields • Nested structures • Arrays
  21. 21. ©2016 Couchbase Inc.©2016 Couchbase Inc. precision vs. recall • Precision – ratio of document matches that are actually relevant • Recall – ratio of relevant documents that are actually matched • High quality results depend on performing the right analysis for your text • Beware: increasing precision may reduce recall (and vice versa)
  22. 22. ©2016 Couchbase Inc.©2016 Couchbase Inc. agenda how does it scale?
  23. 23. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale?
  24. 24. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning? (hash partitioning) )
  25. 25. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔ (replicas promoted)
  26. 26. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes? (auto-placement) ✔ (replicas promoted)
  27. 27. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes (auto-placement) ✔ (replicas promoted)
  28. 28. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes (auto-placement) ✔rebalance? (add/swap/remove) (replicas promoted)
  29. 29. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes (auto-placement) ✔rebalance (add/swap/remove) ✔ (replicas promoted)
  30. 30. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes (auto-placement) ✔rebalance (add/swap/remove) ✔scatter/gather queries?)
  31. 31. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes (auto-placement) ✔rebalance (add/swap/remove) ✔scatter/gather queries (partial results ok) ✔ (replicas promoted)
  32. 32. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes (auto-placement) ✔rebalance (add/swap/remove) ✔scatter/gather queries (partial results ok)
  33. 33. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes (auto-placement) ✔rebalance (add/swap/remove) ✔scatter/gather queries (partial results ok) ✔replicas? (only primaries queried)
  34. 34. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes (auto-placement) ✔rebalance (add/swap/remove) ✔scatter/gather queries (partial results ok) ✔replicas (only primaries queried)
  35. 35. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes (auto-placement) ✔rebalance (add/swap/remove) ✔scatter/gather queries (partial results ok) ✔replicas (only primaries queried) ✔failover? (replicas promoted)
  36. 36. ©2016 Couchbase Inc.©2016 Couchbase Inc. how does it scale? ✔auto index partitioning (hash partitioning) ✔to multiple FTS nodes (auto-placement) ✔rebalance (add/swap/remove) ✔scatter/gather queries (partial results ok) ✔replicas (only primaries queried) ✔failover (replicas promoted)
  37. 37. ©2016 Couchbase Inc.©2016 Couchbase Inc. agenda demo
  38. 38. ©2016 Couchbase Inc.©2016 Couchbase Inc. agenda best practices
  39. 39. ©2016 Couchbase Inc.©2016 Couchbase Inc. only use explicit field mappings in production { “type” : ”brewery”, “random_number” : 4, “edible” : false } Dynamic mappings are great, until…
  40. 40. ©2016 Couchbase Inc.©2016 Couchbase Inc. only use explicit field mappings in production { “type” : ”brewery”, “comments”: 4k of text “random_number” : 4, “edible” : false } Developer adds one small field
  41. 41. ©2016 Couchbase Inc.©2016 Couchbase Inc. only use explicit field mappings in production { “type” : ”brewery”, “comments”: 4k of text “random_number” : 4, “edible” : false } Developer adds one small field
  42. 42. ©2016 Couchbase Inc.©2016 Couchbase Inc. always use Index Aliases Index Rebuilding
  43. 43. ©2016 Couchbase Inc.©2016 Couchbase Inc. always use Index Aliases /users /usersV1
  44. 44. ©2016 Couchbase Inc.©2016 Couchbase Inc. always use Index Aliases /users /usersV1 /usersV2 Indexing 55%
  45. 45. ©2016 Couchbase Inc.©2016 Couchbase Inc. always use Index Aliases /users /usersV1 /usersV2 Atomic Switch to /usersV2
  46. 46. ©2016 Couchbase Inc.©2016 Couchbase Inc. always use Index Aliases /users /usersV2 Atomic Switch to /usersV2
  47. 47. ©2016 Couchbase Inc.©2016 Couchbase Inc. go watch! Dave Starling 52 seenit
  48. 48. ©2016 Couchbase Inc.©2016 Couchbase Inc. agenda status / roadmap / what’s next
  49. 49. ©2016 Couchbase Inc.©2016 Couchbase Inc. project status FTS is developer preview in 4.5, 4.6 planned GA in Spock please help kick the tires http://www.couchbase.com/download
  50. 50. ©2016 Couchbase Inc. Couchbase FullText Search (FTS) thanks!
  51. 51. ©2016 Couchbase Inc.©2016 Couchbase Inc. links & Q+A http://NICE-URL-TODO-HERE downloads, getting started, tech docs and, where you can ask questions and share your feedback!
  52. 52. ©2016 Couchbase Inc.©2016 Couchbase Inc. EXTRA SLIDES 57
  53. 53. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design couchbase couchbase couchbase FTS FTS FTS cfg DCP streams for incremental index updates a cfg bucket holds metadata about the indexes
  54. 54. ©2016 Couchbase Inc. 59 Transition SlideTitle Transition Slide Subtitle Goes Here
  55. 55. ©2016 Couchbase Inc. 60 Transition SlideTitle Transition Slide Subtitle Goes Here
  56. 56. ©2016 Couchbase Inc.©2016 Couchbase Inc. Title of Slide Goes Here • Heading 1 • Heading 2 • Heading 3 • Heading 4 61
  57. 57. ©2016 Couchbase Inc. 62 Title of Slide Goes Here • Heading 1 • Heading 2 • Heading 3 • Heading 4 • Heading 1 • Heading 2 • Heading 3 • Heading 4
  58. 58. ©2016 Couchbase Inc. 63 Speaker Name SpeakersTitle Contact information IMAGE GOES HERE
  59. 59. ©2016 Couchbase Inc. ThankYou! 64
  60. 60. ©2016 Couchbase Inc.©2016 Couchbase Inc. agenda design
  61. 61. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / index partitioning
  62. 62. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) FTS nodes: X Y Z
  63. 63. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: (groups of vbuckets) FTS nodes: X Y Z
  64. 64. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023 FTS nodes: X Y Z
  65. 65. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023 assign to FTS nodes: FTS nodes: X Y Z
  66. 66. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023 assign to FTS nodes: FTS nodes: X Y Z
  67. 67. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023 assign to FTS nodes: replicas, too: FTS nodes: X Y Z
  68. 68. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023 assign to FTS nodes: replicas, too: FTS nodes: X Y Z
  69. 69. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / indexing couchbase couchbase couchbase FTS FTS FTS DCP streams for incremental index updates
  70. 70. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / indexing couchbase couchbase couchbase FTS FTS FTS DCP streams for incremental index updates
  71. 71. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / queries a query sent to any FTS node… your application RESTFTS FTS FTS
  72. 72. ©2016 Couchbase Inc.©2016 Couchbase Inc. FTS design / queries a query sent to any FTS node… …is scatter / gathered to the other FTS nodes REST your application FTS FTS FTS

×