Debugging and
Testing ES Systems
Chris Birchall
2013/8/29
Elasticsearch 勉強会 第1回
#elasticsearchjp
Elasticsearch and me
● At Infoscience, helped build a log
management product based on ES +
Hadoop
● At M3, ES evangelist (...
Search at M3
● Using ES for all new services
○ Search, recommendation (MoreLikeThis)
● Slowly migrating other services fro...
Debugging
Mostly debugging of queries
● “Why doesn’t doc X match query Y?”
● “Why does this search return no results?”
Ope...
Debugging - Step 1
Check for typos!
ES will silently ignore many typos in
settings/mapping definitions
Typo - Example
$ curl -X PUT localhost:9200/myapp -d '{
"settings": {
"number_of_shards": 3
},
"mapping" : {
"article" : {...
Typo - Example (cont’d)
{"ok":true,"acknowledged":true}
Response from ES:
OK, seems fine...
Typo - Example (cont’d)
$ curl localhost:9200/myapp/_mappings?pretty
Response from ES:
{
"myapp" : { }
}
Eh?
Where are my ...
Typo - Example (cont’d)
$ curl -X PUT localhost:9200/myapp -d '{
"settings": {
"number_of_shards": 3
},
"mappings" : {
"ar...
Debugging - Step 2
Set up a local environment
● Makes it easy to wipe & rebuild index
Setting up a local env (OSX)
# Install
$ brew install elasticsearch
# Kuromoji plugin (optional)
$ /usr/local/opt/elastics...
Useful commands - Analyze
$ curl 'localhost:9200/myindex/_analyze?pretty' /
-d '東京特許許可局許可局長'
{
"tokens" : [ {
"token" : "東...
Useful commands - Explain
$ curl 'localhost:9200/kuro/docs/123/_explain?pretty' /
-d '{ "query": { "term": { "body": "東京" ...
Tuning queries
Parameters to tweak
● default_operator (AND/OR)
● auto_generate_phrase_queries
● minumum_should_match
● Sto...
Why disable Kuromoji?
Problem: occasionally weird tokenization
● AND query will fail, because not all terms match
● OR que...
Useful plugin - Head
$ bin/plugin -install mobz/elasticsearch-head
http://mobz.github.io/elasticsearch-head/
Testing
Main goal: Ensure that queries return the
results that we expect
● Test coverage of representative queries
○ Freed...
Testing - Java
elasticsearch-test is awesome
● DSL to set up/tear down ES
● Annotations + JUnit runner
● ES runs in-proces...
https://github.com/cb372/elasticsearch-test-example
Testing - Java
Simple elasticsearch-test example
Testing - Ruby
Simple Rails + Tire + RSpec example
https://github.com/cb372/elasticsearch-rspec-example
We’re hiring!
TODO We are hiring slide
http://bit.ly/m3jobs
Upcoming SlideShare
Loading in...5
×

Debugging and Testing ES Systems

4,756

Published on

Published in: Technology, News & Politics

Debugging and Testing ES Systems

  1. 1. Debugging and Testing ES Systems Chris Birchall 2013/8/29 Elasticsearch 勉強会 第1回 #elasticsearchjp
  2. 2. Elasticsearch and me ● At Infoscience, helped build a log management product based on ES + Hadoop ● At M3, ES evangelist (??) ○ Maintain ES cluster ○ Help dev teams integrate ES into their apps Twitter: @cbirchall Github: https://github.com/cb372
  3. 3. Search at M3 ● Using ES for all new services ○ Search, recommendation (MoreLikeThis) ● Slowly migrating other services from Solr ● A few legacy services use Lucene directly ● Running all indices on one ES cluster ● Kuromoji for Japanese content
  4. 4. Debugging Mostly debugging of queries ● “Why doesn’t doc X match query Y?” ● “Why does this search return no results?” Operational issues are very rare ● ES’s clustering magic is surprisingly stable! ● No performance issues so far
  5. 5. Debugging - Step 1 Check for typos! ES will silently ignore many typos in settings/mapping definitions
  6. 6. Typo - Example $ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 }, "mapping" : { "article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" }, ... } }, ... }' Let’s create a new index...
  7. 7. Typo - Example (cont’d) {"ok":true,"acknowledged":true} Response from ES: OK, seems fine...
  8. 8. Typo - Example (cont’d) $ curl localhost:9200/myapp/_mappings?pretty Response from ES: { "myapp" : { } } Eh? Where are my lovingly-crafted mappings?! Now check the mappings...
  9. 9. Typo - Example (cont’d) $ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 }, "mappings" : { "article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" }, ... } }, ... }' Oops!
  10. 10. Debugging - Step 2 Set up a local environment ● Makes it easy to wipe & rebuild index
  11. 11. Setting up a local env (OSX) # Install $ brew install elasticsearch # Kuromoji plugin (optional) $ /usr/local/opt/elasticsearch/bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/1.5.0 # Start $ elasticsearch # Create index $ curl -X PUT localhost:9200/my_app -d '{ ... }' # Insert some documents $ curl -X PUT localhost:9200/my_app/my_type/1 -d '{ ... }' $ curl -X PUT localhost:9200/my_app/my_type/2 -d '{ ... }' # Done!
  12. 12. Useful commands - Analyze $ curl 'localhost:9200/myindex/_analyze?pretty' / -d '東京特許許可局許可局長' { "tokens" : [ { "token" : "東京", "start_offset" : 0, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "特許", "start_offset" : 2, "end_offset" : 4, "type" : "word", ... How is my document/query being tokenized?
  13. 13. Useful commands - Explain $ curl 'localhost:9200/kuro/docs/123/_explain?pretty' / -d '{ "query": { "term": { "body": "東京" } } }' { ... "matched" : true, "explanation" : { "value" : 0.375, "description" : "weight(body:東京 in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.375, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" ... Why does this document (not) match this query? Specify document ID
  14. 14. Tuning queries Parameters to tweak ● default_operator (AND/OR) ● auto_generate_phrase_queries ● minumum_should_match ● Stop words/tags ● Kuromoji ○ Segmentation mode ○ Reading form filter ○ Disable Kuromoji! (for some fields)
  15. 15. Why disable Kuromoji? Problem: occasionally weird tokenization ● AND query will fail, because not all terms match ● OR query will match any document with 病院 → low precision Phrase Terms 特定医療法人財団 日本会 東日本病院 (document field) 特定、医療、法人、財団、 日本、会、東日本、病院 東日本 (query) 東日、東日本、本 東日本病院 (query) 東、東日本、日本、病院
  16. 16. Useful plugin - Head $ bin/plugin -install mobz/elasticsearch-head http://mobz.github.io/elasticsearch-head/
  17. 17. Testing Main goal: Ensure that queries return the results that we expect ● Test coverage of representative queries ○ Freedom to tune for a given query without breaking other queries Ideally, tests should: ● Run fast ● Run standalone (i.e. no need to have an ES server running)
  18. 18. Testing - Java elasticsearch-test is awesome ● DSL to set up/tear down ES ● Annotations + JUnit runner ● ES runs in-process ○ No need to start an external ES server ● Index is stored in-memory ○ Runs quickly https://github.com/tlrx/elasticsearch-test
  19. 19. https://github.com/cb372/elasticsearch-test-example Testing - Java Simple elasticsearch-test example
  20. 20. Testing - Ruby Simple Rails + Tire + RSpec example https://github.com/cb372/elasticsearch-rspec-example
  21. 21. We’re hiring! TODO We are hiring slide http://bit.ly/m3jobs
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×