CouchDB Day NYC 2017: Full Text Search


Published in: Software
  1. 1. CouchDB Developer Day Full-Text Search Lab
  2. 2. Create a Cloudant account • Go to • Sign up!
  3. 3. Setup curl $ –X PUT curl $ –X PUT –d '{"indexes":{"baz":{"index":"function(doc){index("color", doc.color); index("size", doc.size);}"}}}' curl $ 1 –X PUT –d '{"size": "small", "color": "green"}' curl $ –X PUT –d '{"size": "large", "color": "green"}' curl $ –X PUT –d '{"size": "small", "color": "red"}'
  4. 4. Searching curl $ curl $ curl $ curl $
  5. 5. Pagination Every search request returns a "bookmark" attribute. Pass this back to Cloudant to get the next "page" of results. curl https://$*:*&limit=1 curl https://$*:*&limit=1&bookmark=g2wAAAABaANkA B9kYmNvcmVAZGI1LmplbmV2ZXIuY2xvdWRhbnQubmV0bAAAAAJhAGI_____amgCRj_wAAAAAA AAYQBq
  6. 6. Sorting The "sort" parameter lets you sort results on any indexed field or combination of indexed fields. curl https://$*:*&sort="size<string>" curl https://$*:*&sort="color<string>"
  7. 7. Tokenization ( • Tokenizers break down textual input into tokens for efficient and flexible searching • Using an appropriate tokenizer is often critical • Generic analyzers: standard, email, keyword, whitespace • Language specific analyzers: english, french, german, spanish, chinese, dutch... • You can configure different analyzers for different fields • Some tokenizers omit common words • Some tokenizers omit common prefixes or suffixes
  8. 8. Tokenization Examples > curl https://$ –Hcontent-type:application/json –d '{"analyzer":"standard", "text": ""}' {"tokens":["rnewson",""]} > curl https://$ –Hcontent-type:application/json –d '{"analyzer":"email", "text": ""}' {"tokens":[""]} > curl https://$ –Hcontent-type:application/json –d '{"analyzer":"english", "text": "running"}' {"tokens":["run"]}