Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

1,287 views

Published on

Support for full-text indexes is a major bullet point on the roadmap for Couchbase Server, so back in 2014, we kicked off the open-source github.com/couchbaselabs/cbft project. The cbft server integrates the open source bleve full-text indexing engine with Couchbase Server. In this technical presentation, come learn about cbft’s architecture, features, UI, API’s, status and future directions.

Published in: Technology
  • Be the first to comment

A sneak peek at CBFT: A Full Text Search for Couchbase: Couchbase Connect 2015

  1. 1. A SNEAK PEEK AT CBFT Couchbase Full-Text Server Marty Schoch & Steve Yen, Couchbase, Inc.
  2. 2. ©2015 Couchbase Inc. 2 about the speakers SteveYen steve@couchbase.com co-founder Couchbase
  3. 3. ©2015 Couchbase Inc. 3 about the speakers Marty Schoch lead contributor to bleve the most popular, open-source full-text indexing engine for golang
  4. 4. ©2015 Couchbase Inc. 4 agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next
  5. 5. ©2015 Couchbase Inc. 5 agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next
  6. 6. ©2015 Couchbase Inc. 6 why cbft? couchbase connectors… yes yes Lucidworks yes
  7. 7. ©2015 Couchbase Inc. 7 why cbft? couchbase connectors… yet another tier & cluster to manage yes yes yes yes Lucidworks yes yes
  8. 8. ©2015 Couchbase Inc. 8 why cbft? why cbft? simple integrated 80/20 of features
  9. 9. ©2015 Couchbase Inc. 9 agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next
  10. 10. ©2015 Couchbase Inc. 10 what’s full-text search?
  11. 11. ©2015 Couchbase Inc. 11 advanced search
  12. 12. ©2015 Couchbase Inc. 12 search results
  13. 13. ©2015 Couchbase Inc. 13 search results Spelling Suggestions
  14. 14. ©2015 Couchbase Inc. 14 search results Spelling Suggestions ResultText Snippets
  15. 15. ©2015 Couchbase Inc. 15 search results Spelling Suggestions ResultText Snippets Highlighted SearchTerms
  16. 16. ©2015 Couchbase Inc. 16 faceted search
  17. 17. ©2015 Couchbase Inc. 17 JSON document in Couchbase Key: akay1980 Document: { “name”: “Alan Kay”, “description”: “... the wisest engineer ...” }
  18. 18. ©2015 Couchbase Inc. 18 Text Analysis : tokenizer + token filters A pipeline of transformations OneTokenizer Zero or moreToken Filters
  19. 19. ©2015 Couchbase Inc. 19 “… the wisest engineer …” the wisest engineer • Seems like simple whitespace… but, this doesn’t work for all languages • Unicode standard rules help (see Unicode Standard Annex #29) • Still need to account for exceptions • E-mail addresses and URLs don’t follow normal rules Text Analysis : tokenizer + token filters
  20. 20. ©2015 Couchbase Inc. 20 Text Analysis : tokenizer + token filters the wisest engineer StopWord Removal the wisest engineer Stemming wise engineer
  21. 21. ©2015 Couchbase Inc. 21 Inverted Index … wise … engineer … … … …, akay1980, … …, akay1980, … Inverted Index
  22. 22. ©2015 Couchbase Inc. 22 Search … wise … engineer … … … …, akay1980, … …, akay1980, … engineers Inverted Index
  23. 23. ©2015 Couchbase Inc. 23 Search … wise … engineer … … … …, akay1980, … …, akay1980, … engineers engineer Apply the same analysis at search time that we used at index time. Inverted Index
  24. 24. ©2015 Couchbase Inc. 24 Search … wise … engineer … … … …, akay1980, … …, akay1980, … engineers engineer Exact Match Apply the same analysis at search time that we used at index time. Inverted Index
  25. 25. ©2015 Couchbase Inc. 25 Document Scoring • tf/idf scoring • Term Frequency • How often does a term occur in a doc? • More often yields a higher score • Inverse Document Frequency • How many docs have this term? • More docs yield lower score (because the term is more common)
  26. 26. ©2015 Couchbase Inc. 26 Quality Results • Getting high quality results depends on the right text analysis • Beware: adjustments that increase precision may reduce recall (and the other way around)
  27. 27. ©2015 Couchbase Inc. 27 agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next
  28. 28. ©2015 Couchbase Inc. 28 cbft design / index partitioning
  29. 29. ©2015 Couchbase Inc. 29 cbft design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
  30. 30. ©2015 Couchbase Inc. 30 cbft design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C
  31. 31. ©2015 Couchbase Inc. 31 cbft design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C
  32. 32. ©2015 Couchbase Inc. 32 cbft design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023
  33. 33. ©2015 Couchbase Inc. 33 cbft design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023 cbft nodes: X
  34. 34. ©2015 Couchbase Inc. 34 cbft design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023 assign to cbft nodes: cbft nodes: X
  35. 35. ©2015 Couchbase Inc. 35 cbft design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023 assign to cbft nodes: cbft nodes: X Y Z
  36. 36. ©2015 Couchbase Inc. 36 cbft design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023 assign to cbft nodes: replicas, too: cbft nodes: X Y Z
  37. 37. ©2015 Couchbase Inc. 37 cbft design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets) index partitions: A B C (groups of vbuckets) 0-399 400-799 800-1023 assign to cbft nodes: replicas, too: cbft nodes: X Y Z
  38. 38. ©2015 Couchbase Inc. 38 cbft design / indexing couchbase couchbase couchbase cbft cbft cbft DCP streams
  39. 39. ©2015 Couchbase Inc. 39 cbft design / indexing couchbase couchbase couchbase cbft cbft cbft DCP streams
  40. 40. ©2015 Couchbase Inc. 40 cbft design / queries cbft cbft a query sent to any cbft node… your application cbftREST
  41. 41. ©2015 Couchbase Inc. 41 cbft design / queries cbft cbft a query sent to any cbft node… …is scatter / gathered to the other cbft nodes your application cbftREST
  42. 42. ©2015 Couchbase Inc. 42 agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next
  43. 43. ©2015 Couchbase Inc. 43 agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next
  44. 44. ©2015 Couchbase Inc. 44 project status cbft is developer preview! please help kick the tires http://labs.couchbase.com/cbft
  45. 45. ©2015 Couchbase Inc. 45 project status / roadmap / what’s next today bleve full-text engine y advanced mappings y faceted search y incremental indexing y index partitioning and replication y index aliases y
  46. 46. ©2015 Couchbase Inc. 46 project status / roadmap / what’s next today future bleve full-text engine y y advanced mappings y y faceted search y y incremental indexing y y index partitioning and replication y y index aliases y y integrated intoCouchbase Server & N1QL y API stability y production quality y performance optimization / tuning y forestdb storage & partial rollbacks y security, SSL y more docs, examples, SDK support y
  47. 47. ©2015 Couchbase Inc. 47 links & Q+A http://labs.couchbase.com/cbft downloads, getting started, tech docs and, share your feedback! THANKS! (and please do the survey!)
  48. 48. ©2015 Couchbase Inc. 48
  49. 49. A SNEAK PEEK AT CBFT couchbase full-text server THANKS! (and please do the survey!)
  50. 50. ©2015 Couchbase Inc. 50 cbft design couchbase couchbase couchbase cbft cbft cbft cfg DCP streams for incremental index updates a cfg bucket holds metadata about the indexes

×