0
Couchbase Server 2.0:Full Text Search Integration               John Zablocki               Developer Advocate            ...
Couchbase Server 2.0 Distributed Indexing and Querying using Incremental Map Reduce                                       ...
Search Across Full JSON Body    {        "name": "Abbey Belgian Style Ale",        "description": "Winner of four World Be...
Search Across Full JSON Body    {        "name": "Abbey Belgian Style Ale",        "description": "Winner of four World Be...
Integrate with ElasticSearch for Full Text Search•   Based on proven Apache Lucene technology•   Apache 2 Licensed with co...
ElasticSearch Terminology• Document  – Schema-less JSON…  – Contains a set of fields• Type  – Contains a set of mappings d...
How does it work?     Unidirectional Cross Data Center Replication                                      ElasticSearch     ...
GETTING STARTED                  8
Install the Couchbase Plug-In• Pre-requisite   – Existing Couchbase and ElasticSearch Clusters• Install the ElasticSearch ...
Configure XDCR (part 1)                          10
Configure XDCR (part 2)                          11
Documents are now being indexed!            Document Count               Increasing                                   12
WHAT NOW?            13
Document from Beer Sample Dataset{    "name": "Pabst Blue Ribbon",    "abv": 4.74,    "ibu": 0,    "srm": 0,    "upc": 0, ...
Simple ES Query with HTTP• Search for any beer matching the term “lager”   – GET http://127.0.0.1:9200/beer-sample/_search...
Simple ES Query with HTTP• Search for any beer matching the term “lager”   – GET http://127.0.0.1:9200/beer-sample/_search...
Simple ES Query with HTTP• Search for any beer matching the term “lager”   – GET http://127.0.0.1:9200/beer-sample/_search...
Simple ES Query with HTTP• Search for any beer matching the term “lager”   – GET http://127.0.0.1:9200/beer-sample/_search...
Simple ES Query with HTTP• Search for any beer matching the term “lager”   – GET http://127.0.0.1:9200/beer-sample/_search...
Single Search Result  "hits": [ {    "_index": "beer-sample",    "_type": "couchbaseDocument",    "_id": "110fc4b16b",    ...
Single Search Result  "hits": [ {    "_index": "beer-sample",    "_type": "couchbaseDocument",    "_id": "110fc4b16b",    ...
Recommended Usage Pattern1. ElasticSearch Query                         2. ElasticSearch Result                           ...
Architecture Overview                                                    App Server                                    Cou...
MORE ADVANCED CAPABILITIES                             24
Another Query with HTTP• POST http://127.0.0.1:9200/default/_search   {       "query": {         "query_string": {        ...
Faceted Search  Categories                 Items with Counts Range Facets                                     26
Faceted Search Query – Beer Style{    "query": {        "query_string":{            "query":"bud”        }    },    "facet...
Faceted Search Results - Incorrect"terms": [    {        "term": "style"        "count": 8    }    {        "term": "lager...
Update the Mapping• PUT /beer-sample/couchbaseDocument/_mapping{    "couchbaseDocument":{        "properties":{           ...
Faceted Search Results - Correct "terms": [     {         "term": "American-Style Light Lager”,         "count": 5     }, ...
Faceted Search Query – % Alcohol Range{    "query": {        "query_string":{            "query":"bud”        }    },    "...
Faceted Search Results - % Alcohol Range "ranges": [     {         "to": 3,         "count": 1     },     {         "from"...
Search Result Scoring• Each matching document is assigned a scored based  on how well it matches the query hits: [ {     "...
Custom Scoring – Document Properties    • Each document has a numerical field “abv”    • Let’s use this field to boost the...
Custom Scoring – User Preferences• Let users rank beer styles from 1-10• User with no preferences set searches for “bud”  ...
Custom Scoring – User Preferences    • User ranks “Belgian-Style White” with value 10{    "query": {      "custom_filters_...
Custom Scoring – User Preferences Name                     Style                        Score Bud Light Golden Wheat   Bel...
Learning Portal – Proof of Concept                                     38
NEXT STEPS             39
Explore ElasticSearch Capabilities• Customize Document Mappings   – Default behavior isn’t always what you want   – Index ...
Couchbase ElasticSearch Future• Release 1.0.0• Possible features for future   –   More fine-grained cluster configuration ...
Resources        • Marty Schoch’s blog:          http://blog.couchbase.com/couchbase-and-full-          text-search-couchb...
Upcoming SlideShare
Loading in...5
×

CouchConf_Full Text Search

385

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
385
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • ----- Meeting Notes (9/12/12 15:47) -----explain that it will be separate cluster infrastructrejSON schema freepossibly add graphic hereadd the term full-textintegrate with elasticsearch for full-text
  • ----- Meeting Notes (9/12/12 15:47) -----do a better job document typesdocument, field, type, index
  • ----- Meeting Notes (9/12/12 15:47) -----make the text biggeradd another slide with deeper explanation
  • ----- Meeting Notes (9/12/12 15:47) -----clarify that installation is on the ES node
  • ----- Meeting Notes (9/12/12 15:47) -----emphasize full-text querycome up with some way to emphasize which queries are ES and which are couchbase
  • ----- Meeting Notes (9/12/12 15:47) -----emphasize full-text querycome up with some way to emphasize which queries are ES and which are couchbase
  • ----- Meeting Notes (9/12/12 15:47) -----emphasize full-text querycome up with some way to emphasize which queries are ES and which are couchbase
  • ----- Meeting Notes (9/12/12 15:47) -----emphasize full-text querycome up with some way to emphasize which queries are ES and which are couchbase
  • ----- Meeting Notes (9/12/12 15:47) -----emphasize full-text querycome up with some way to emphasize which queries are ES and which are couchbase
  • ----- Meeting Notes (9/12/12 15:47) -----add full architecture slide here
  • ----- Meeting Notes (9/12/12 15:47) -----addd clarification about which come from whichcheck better image
  • ----- Meeting Notes (9/12/12 15:47) -----show some other indexersadd NOTE you need to reindex
  • ----- Meeting Notes (9/12/12 15:47) -----add link to guide
  • ----- Meeting Notes (9/12/12 15:47) -----ask for more feedback here
  • Transcript of "CouchConf_Full Text Search"

    1. 1. Couchbase Server 2.0:Full Text Search Integration John Zablocki Developer Advocate 1
    2. 2. Couchbase Server 2.0 Distributed Indexing and Querying using Incremental Map Reduce Query / Response SERVER 1 SERVER 2 SERVER 3 Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 2 DOC Doc 5 DOC 2
    3. 3. Search Across Full JSON Body { "name": "Abbey Belgian Style Ale", "description": "Winner of four World Beer Cup medals and eight medals at the Great American Beer Fest, Abbey Belgian Ale is the Mark Spitz of New Belgium’s lineup – but it didn’t start out that way." } Search term: abbey 3
    4. 4. Search Across Full JSON Body { "name": "Abbey Belgian Style Ale", "description": "Winner of four World Beer Cup medals and eight medals at the Great American Beer Fest, Abbey Belgian Ale is the Mark Spitz of New Belgium’s lineup – but it didn’t start out that way." } Search term: abbey 4
    5. 5. Integrate with ElasticSearch for Full Text Search• Based on proven Apache Lucene technology• Apache 2 Licensed with commercial support available• Distributed• Schema Free JSON Documents• RESTful API 5
    6. 6. ElasticSearch Terminology• Document – Schema-less JSON… – Contains a set of fields• Type – Contains a set of mappings describing how fields are indexed• Index – Logical namespace for scoping indexing/searching – May contain documents of different types – Uniqueness by ID/Type 6
    7. 7. How does it work? Unidirectional Cross Data Center Replication ElasticSearch 7
    8. 8. GETTING STARTED 8
    9. 9. Install the Couchbase Plug-In• Pre-requisite – Existing Couchbase and ElasticSearch Clusters• Install the ElasticSearch Couchbase Transport Plug-in – bin/plugin -install couchbaselabs/elasticsearch-transport-couchbase/1.0.0-beta• Configure the Plug-in – Set a password – Install the Couchbase Index Template• Restart ElasticSearch• Create an ElasticSearch index for your documents 9
    10. 10. Configure XDCR (part 1) 10
    11. 11. Configure XDCR (part 2) 11
    12. 12. Documents are now being indexed! Document Count Increasing 12
    13. 13. WHAT NOW? 13
    14. 14. Document from Beer Sample Dataset{ "name": "Pabst Blue Ribbon", "abv": 4.74, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1d5dc2", "updated": "2010-07-22 20:00:20", "description": "PBR is not just any beer…", "style": "American-Style Light Lager", "category": "North American Lager"} 14
    15. 15. Simple ES Query with HTTP• Search for any beer matching the term “lager” – GET http://127.0.0.1:9200/beer-sample/_search?q=lager { "took": 7, "timed_out": false, "_shards": { ... }, "hits": { "total": 1271, "max_score": 1.1145955, "hits": [...] } } 15
    16. 16. Simple ES Query with HTTP• Search for any beer matching the term “lager” – GET http://127.0.0.1:9200/beer-sample/_search?q=lager { "took": 7, Total Search Execution "timed_out": false, Time "_shards": { ... }, "hits": { "total": 1271, "max_score": 1.1145955, "hits": [...] } } 16
    17. 17. Simple ES Query with HTTP• Search for any beer matching the term “lager” – GET http://127.0.0.1:9200/beer-sample/_search?q=lager { "took": 7, "timed_out": false, "_shards": { ... }, Total Number of "hits": { Documents Matching "total": 1271, Query "max_score": 1.1145955, "hits": [...] } } 17
    18. 18. Simple ES Query with HTTP• Search for any beer matching the term “lager” – GET http://127.0.0.1:9200/beer-sample/_search?q=lager { "took": 7, "timed_out": false, "_shards": { ... }, "hits": { "total": 1271, Maximum Score of All "max_score": 1.1145955, Matching Documents "hits": [...] } } 18
    19. 19. Simple ES Query with HTTP• Search for any beer matching the term “lager” – GET http://127.0.0.1:9200/beer-sample/_search?q=lager { "took": 7, "timed_out": false, "_shards": { ... }, "hits": { "total": 1271, "max_score": 1.1145955, Array of Matching "hits": [...] Documents } } 19
    20. 20. Single Search Result "hits": [ { "_index": "beer-sample", "_type": "couchbaseDocument", "_id": "110fc4b16b", "_score": 1.1145955, ID of Matching "_source": { Document "meta": { "id": "110fc4b16b", "rev": "1-001ba0044ce30dd50000000000000000", "flags": 0, "expiration": 0 } } }, … ] 20
    21. 21. Single Search Result "hits": [ { "_index": "beer-sample", "_type": "couchbaseDocument", "_id": "110fc4b16b", "_score": 1.1145955, "_source": { "meta": { "id": "110fc4b16b", "rev": "1-001ba0044ce30dd50000000000000000", "flags": 0, "expiration": 0 } } }, … ] Where’s the document body? 21
    22. 22. Recommended Usage Pattern1. ElasticSearch Query 2. ElasticSearch Result 3. Couchbase Multi-GET 4. Couchbase Result ElasticSearch 22
    23. 23. Architecture Overview App Server Couchbase SDK ES queries over HTTP Data Refs MR Query ES Query M MR MR MR MR Views Views Views Views Index Server Cluster Couchbase Server Cluster XDCR Couchbase ES Transport 23
    24. 24. MORE ADVANCED CAPABILITIES 24
    25. 25. Another Query with HTTP• POST http://127.0.0.1:9200/default/_search { "query": { "query_string": { "query": "style: lambic AND description: blueberry" } } } { "name": "Wild Blue Blueberry Lager", "abv": 8, "type": "beer", "brewery_id": "110f01abce", "updated": "2010-07-22 20:00:20", "description": "…ripe blueberry aroma…", "style": "Belgian-Style Fruit Lambic", "category": "Belgian and French Ale" } 25
    26. 26. Faceted Search Categories Items with Counts Range Facets 26
    27. 27. Faceted Search Query – Beer Style{ "query": { "query_string":{ "query":"bud” } }, "facets" : { "styles" : { "terms" : { "field" : "style", "size" : 3 } } }} 27
    28. 28. Faceted Search Results - Incorrect"terms": [ { "term": "style" "count": 8 } { "term": "lager" "count": 6 } { "term": "american" "count": 4 }] Style was “American-Style Lager” 28
    29. 29. Update the Mapping• PUT /beer-sample/couchbaseDocument/_mapping{ "couchbaseDocument":{ "properties":{ "doc":{ "properties":{ "style": { "type":"string", "index": "not_analyzed" } } } } }}NOTE: When you change the mapping you MUST re-index. 29
    30. 30. Faceted Search Results - Correct "terms": [ { "term": "American-Style Light Lager”, "count": 5 }, { "term": "American-Style Lager”, "count": 2 }, { "term": "Belgian-Style White”, "count": 1 } ] 30
    31. 31. Faceted Search Query – % Alcohol Range{ "query": { "query_string":{ "query":"bud” } }, "facets" : { "abv" : { "range" : { "abv" : [ { "to" : 3 }, { "from" : 3, "to" : 5 }, { "from" : 5 } ] } } }} 31
    32. 32. Faceted Search Results - % Alcohol Range "ranges": [ { "to": 3, "count": 1 }, { "from": 3, "to": 5, "count": 5 }, { "from": 5, "count": 3 } ] 32
    33. 33. Search Result Scoring• Each matching document is assigned a scored based on how well it matches the query hits: [ { "_index": "default", "_type": "couchbaseDocument", "_id": "35addbc374", "_score": 1.1306798, … 33
    34. 34. Custom Scoring – Document Properties • Each document has a numerical field “abv” • Let’s use this field to boost the beers natural score{ "query": { "custom_score" : { "query": { "query_string": { "query": "bud" } }, "script" : "_score * doc[abv].value" } }} 34
    35. 35. Custom Scoring – User Preferences• Let users rank beer styles from 1-10• User with no preferences set searches for “bud” Name Style Score Bud Extra 1.5409653 Bud Light Lime American-Style Light Lager 1.513119 Bud Light Golden Wheat Belgian-Style White 1.3208274 Bud Ice American-Style Lager 1.2839241 Bud Ice Light American-Style Lager 1.2839241 Bud Light American-Style Light Lager 1.245288 Bud Dry American-Style Light Lager 1.1968427 Budweiser Select American-Style Light Lager 0.8559494 Miller Lite American-Style Light Lager 0.7201389 35
    36. 36. Custom Scoring – User Preferences • User ranks “Belgian-Style White” with value 10{ "query": { "custom_filters_score" : { "query" : { "text" : { "_all": "bud"} }, "filters" : [ { "filter" : { "term" : { "style" : "Belgian-Style White" } }, "boost" : "10" } ], "score_mode" : "first” } }} 36
    37. 37. Custom Scoring – User Preferences Name Style Score Bud Light Golden Wheat Belgian-Style White 13.208274 Bud Extra 1.5409653 Bud Light Lime American-Style Light Lager 1.513119 Bud Light Golden Wheat Belgian-Style White 1.3208274 Bud Ice American-Style Lager 1.2839241 Bud Ice Light American-Style Lager 1.2839241 Bud Light American-Style Light Lager 1.245288 Bud Dry American-Style Light Lager 1.1968427 Budweiser Select American-Style Light Lager 0.8559494 Miller Lite American-Style Light Lager 0.7201389 37
    38. 38. Learning Portal – Proof of Concept 38
    39. 39. NEXT STEPS 39
    40. 40. Explore ElasticSearch Capabilities• Customize Document Mappings – Default behavior isn’t always what you want – Index one field multiple ways• Advanced Cluster Topologies – Dedicate nodes for routing/querying• Rich Query DSL ElasticSearch Guide: http://www.elasticsearch.org/guide/ 40
    41. 41. Couchbase ElasticSearch Future• Release 1.0.0• Possible features for future – More fine-grained cluster configuration – More index-level configuration – Pre-index script execution – Indexing non-JSON data• Give us your feedback! 41
    42. 42. Resources • Marty Schoch’s blog: http://blog.couchbase.com/couchbase-and-full- text-search-couchbase-transport-elastic-search • https://github.com/couchbaselabs/elasticsearch- transport-couchbase • john@couchbase.com • @codevoyeur 42
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×