Successfully reported this slideshow.
Your SlideShare is downloading. ×

Terms of endearment - the ElasticSearch Query DSL explained

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 96 Ad

More Related Content

Slideshows for you (20)

Similar to Terms of endearment - the ElasticSearch Query DSL explained (20)

Advertisement

Recently uploaded (20)

Terms of endearment - the ElasticSearch Query DSL explained

  1. 1. “ Terms of Endearment” The ElasticSearch query language explained Clinton Gormley, YAPC::EU 2011 DRTECH @clintongormley
  2. 2. search for : “ DELETE QUERY ” We can
  3. 3. search for : “ DELETE QUERY ” and find : “ deleteByQuery ” We can
  4. 4. but you can only find what is stored in the database
  5. 5. Normalise values “ deleteByQuery” 'delete' 'by' 'query' 'deletebyquery'
  6. 6. Normalise values and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
  7. 7. Normalise values and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
  8. 8. Analyse values and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
  9. 9. What is stored in ElasticSearch?
  10. 10. { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Document:
  11. 11. { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Fields:
  12. 12. { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Values:
  13. 13. { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org" }, tags => [" perl" ,"opinion"], posts => 2, } Field types: # object # string # date # nested object # string # string # array of enums # integer
  14. 14. { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Nested objects flattened:
  15. 15. { tweet => "Perl is GREAT!", posted => "2011-08-15", user.name => "Clinton Gormley", user.email => "drtech@cpan.org", tags => [" perl" ,"opinion"], posts => 2, } Nested objects flattened
  16. 16. { tweet => "Perl is GREAT!", posted => "2011-08-15", user.name => "Clinton Gormley", user.email => "drtech@cpan.org", tags => [" perl" ,"opinion"], posts => 2, } Values analyzed into terms
  17. 17. { tweet => ['perl','great'], posted => [Date(2011-08-15)], user.name => ['clinton','gormley'], user.email => ['drtech','cpan.org'], tags => [' perl' ,'opinion'], posts => [2], } Values analyzed into terms
  18. 18. database table row ⇒ many tables ⇒ many rows ⇒ one schema ⇒ many columns In MySQL
  19. 19. index type document ⇒ many types ⇒ many documents ⇒ one mapping ⇒ many fields In ElasticSearch
  20. 20. Create index with mappings $es-> create_index ( index => 'twitter', mappings => { tweet => { properties => { title => { type => 'string' }, created => { type => 'date' } } } } );
  21. 21. Add a mapping $es-> put_mapping ( index => 'twitter', type => ' user ', mapping => { properties => { name => { type => 'string' }, created => { type => 'date' }, } } );
  22. 22. Can add to existing mapping
  23. 23. Can add to existing mapping Cannot change mapping for field
  24. 24. Core field types { type => 'string', }
  25. 25. Core field types { type => 'string', # byte|short|integer|long|double|float # date, ip addr, geolocation # boolean # binary (as base 64) }
  26. 26. Core field types { type => 'string', index => ' analyzed ', # 'Foo Bar' ⇒ [ 'foo', 'bar' ] }
  27. 27. Core field types { type => 'string', index => ' not_analyzed ', # 'Foo Bar' ⇒ [ 'Foo Bar' ] }
  28. 28. Core field types { type => 'string', index => ' no ', # 'Foo Bar' ⇒ [ ] }
  29. 29. Core field types { type => 'string', index => 'analyzed', analyzer => 'default', }
  30. 30. Core field types { type => 'string', index => 'analyzed', index_ analyzer => 'default', search_ analyzer => 'default', }
  31. 31. Core field types { type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, }
  32. 32. Core field types { type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, include_in_all => 1 |0 }
  33. 33. <ul><li>Standard
  34. 34. Simple
  35. 35. Whitespace
  36. 36. Stop
  37. 37. Keyword </li></ul>Built in analyzers <ul><li>Pattern
  38. 38. Language
  39. 39. Snowball
  40. 40. Custom </li></ul>
  41. 41. The Brown-Cow's Part_No. #A.BC123-456 joe@bloggs.com keyword: The Brown-Cow's Part_No. #A.BC123-456 joe@bloggs.com whitespace: The, Brown-Cow's, Part_No., #A.BC123-456, joe@bloggs.com simple: the, brown, cow, s, part, no, a, bc, joe, bloggs, com standard: brown, cow's, part_no, a.bc123, 456, joe, bloggs.com snowball (English): brown, cow, part_no, a.bc123, 456, joe, bloggs.com
  42. 42. Token filters <ul><li>Standard
  43. 43. ASCII Folding
  44. 44. Length
  45. 45. Lowercase
  46. 46. NGram
  47. 47. Edge NGram
  48. 48. Porter Stem
  49. 49. Shingle
  50. 50. Stop
  51. 51. Word Delimiter </li></ul><ul><li>Stemmer
  52. 52. KStem
  53. 53. Snowball
  54. 54. Phonetic
  55. 55. Synonym
  56. 56. Compound Word
  57. 57. Reverse
  58. 58. Elision
  59. 59. Truncate
  60. 60. Unique </li></ul>
  61. 61. Custom Analyzer $c->create_index( index => 'twitter', settings => { analysis => { analyzer => { ascii_html => { type => 'custom', tokenizer => 'standard', filter => [ qw( standard lowercase asciifolding stop ) ], char_filter => ['html_strip'] } } }} );
  62. 62. Searching $result = $es->search( index => 'twitter', type => 'tweet', );
  63. 63. Searching $result = $es->search( index => ['twitter','facebook'] , type => ['tweet','post'] , );
  64. 64. Searching $result = $es->search( # all indices # all types );
  65. 65. Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, );
  66. 66. Searching $result = $es->search( index => 'twitter', type => 'tweet', query b => 'foo' , # b == ElasticSearch::SearchBuilder );
  67. 67. Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] );
  68. 68. Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] from => 0, size => 10, );
  69. 69. Query DSL
  70. 70. Queries vs Filters
  71. 71. Queries vs Filters <ul><li>full text & terms </li></ul><ul><li>terms only </li></ul>
  72. 72. Queries vs Filters <ul><li>full text & terms
  73. 73. relevance scoring </li></ul><ul><li>terms only
  74. 74. no scoring </li></ul>
  75. 75. Queries vs Filters <ul><li>full text & terms
  76. 76. relevance scoring
  77. 77. slower </li></ul><ul><li>terms only
  78. 78. no scoring
  79. 79. faster </li></ul>
  80. 80. Queries vs Filters <ul><li>full text & terms
  81. 81. relevance scoring
  82. 82. slower
  83. 83. no caching </li></ul><ul><li>terms only
  84. 84. no scoring
  85. 85. faster
  86. 86. cacheable </li></ul>
  87. 87. Queries vs Filters <ul><li>full text & terms
  88. 88. relevance scoring
  89. 89. slower
  90. 90. no caching </li></ul><ul><li>terms only
  91. 91. no scoring
  92. 92. faster
  93. 93. cacheable </li></ul>Use filters for anything that doesn't affect the relevance score!
  94. 94. Query only Query DSL: $es->search( query => { text => { title => 'perl' } } ); SearchBuilder: $es->search( query b => { title => 'perl' } );
  95. 95. Filter only Query DSL: $es->search( query => { constant_score => { filter => { term => { tag => 'perl } } } }); SearchBuilder: $es->search( query b => { -filter => { tag => 'perl' } });
  96. 96. Query and filter Query DSL: $es->search( query => { filtered => { query => { text => { title => 'perl' } }, filter =>{ term => { tag => 'perl' } } } }); SearchBuilder: $es->search( query b => { title => 'perl', -filter => { tag => 'perl' } });
  97. 97. Filters
  98. 98. Filters : equality Query DSL: { term => { tags => 'perl' }} { terms => { tags => ['perl','ruby'] }} SearchBuilder: { tags => 'perl' } { tags => ['perl','ruby'] }
  99. 99. Filters : range Query DSL: { range => { date => { gte => '2010-11-01', lt => '2010-12-01' }} SearchBuilder: { date => { gte => '2010-11-01', lt => '2011-12-01' }}
  100. 100. Filters : range (many values) Query DSL: { numeric_range => { date => { gte => '2010-11-01', lt => '2010-12-01 }} SearchBuilder: { date => { ' >= ' => '2010-11-01', ' < ' => '2011-12-01' }}
  101. 101. Filters : and | or | not Query DSL: { and => [ {term=>{X=>1}}, {term=>{Y=>2}} ]} { or => [ {term=>{X=>1}}, {term=>{Y=>2}} ]} { not => { or => [ {term=>{X=>1}}, {term=>{Y=>2}} ] }} SearchBuilder: { X => 1, Y => 2 } [ X => 1, Y => 2 ] { -not => { X => 1, Y => 2 } } # and { -not => [ X => 1, Y => 2 ] } # or
  102. 102. Filters : exists | missing Query DSL: { exists => { field => 'title' }} { missing => { field => 'title' }} SearchBuilder: { -exists => 'title' } { -missing => 'title' }
  103. 103. Filter example SearchBuilder: { -filter => [ featured => 1, { created_at => { gt => '2011-08-01' }, status => { '!=' => 'pending' }, }, ] }
  104. 104. Filter example Query DSL: { constant_score => { filter => { or => [ { term => { featured => 1 }}, { and => [ { not => { term => { status => 'pending' }}, { range => { created_at => { gt => '2011-08-01' }}}, ] } ] } } }
  105. 105. Filters : others <ul><li>script
  106. 106. nested
  107. 107. has_child
  108. 108. query
  109. 109. match_all
  110. 110. prefix
  111. 111. limit </li></ul><ul><li>ids
  112. 112. type
  113. 113. geo_distance
  114. 114. geo_distance_range
  115. 115. geo_bbox
  116. 116. geo_polygon </li></ul>
  117. 117. Text / Analyzed: <ul><li>text
  118. 118. query_string / field
  119. 119. flt / flt_field
  120. 120. mlt / mlt_field </li></ul>Term / Not analyzed: <ul><li>term / terms
  121. 121. range
  122. 122. prefix
  123. 123. fuzzy
  124. 124. wildcard
  125. 125. ids
  126. 126. span queries </li></ul>Combining: <ul><li>bool
  127. 127. dis_max
  128. 128. boosting </li></ul>Scripting: <ul><li>custom_score
  129. 129. custom_filters_score </li></ul>Wrappers: <ul><li>match_all
  130. 130. constant_score
  131. 131. filtered </li></ul>“ Joins”: <ul><li>nested
  132. 132. has_child
  133. 133. top_children </li></ul>Queries
  134. 134. Text / Analyzed: <ul><li>text
  135. 135. query_string / field
  136. 136. flt / flt_field
  137. 137. mlt / mlt_field </li></ul>Term / Not analyzed: <ul><li>term / terms
  138. 138. range
  139. 139. prefix
  140. 140. fuzzy
  141. 141. wildcard
  142. 142. ids
  143. 143. span queries </li></ul>Combining: <ul><li>bool
  144. 144. dis_max
  145. 145. boosting </li></ul>Scripting: <ul><li>custom_score
  146. 146. custom_filters_score </li></ul>Wrappers: <ul><li>match_all
  147. 147. constant_score
  148. 148. filtered </li></ul>“ Joins”: <ul><li>nested
  149. 149. has_child
  150. 150. top_children </li></ul>Queries
  151. 151. Text/Analyzed Queries mapping aware
  152. 152. Text/Analyzed Queries not_analyzed ⇒ term query
  153. 153. Text/Analyzed Queries analyzed ⇒ text query using search_analyzer
  154. 154. Text-Query Family Query DSL: { text => { title => 'great perl' }} Search Builder: { title => 'great perl' }
  155. 155. Text-Query Family Query DSL: { text => { title => { query => 'great perl' }}} Search Builder: { title => { '=' => { query => 'great perl' }}}
  156. 156. Text-Query Family Query DSL: { text => { title => { query => 'great perl' , operator => 'and' }}} Search Builder: { title => { '=' => { query => 'great perl', operator => 'and' }}}
  157. 157. Text-Query Family Query DSL: { text => { title => { query => 'great perl' , fuzziness => 0.5 }}} Search Builder: { title => { '=' => { query => 'great perl', fuzziness => 0.5 }}}
  158. 158. Text-Query Family Query DSL: { text => { title => { query => 'great perl', type => 'phrase' }}} Search Builder: { title => { '==' => { query => 'great perl', }}}
  159. 159. Text-Query Family Query DSL: { text => { title => { query => ' great perl ', type => 'phrase' }}} Search Builder: { title => { '==' => { query => ' great perl ', }}}
  160. 160. Text-Query Family Query DSL: { text => { title => { query => ' perl is great ', type => 'phrase' }}} Search Builder: { title => { '==' => { query => ' perl is great ', }}}
  161. 161. Text-Query Family Query DSL: { text => { title => { query => ' perl great ', type => 'phrase', slop => 3 }}} Search Builder: { title => { '==' => { query => ' perl great ', slop => 3 }}}
  162. 162. Text-Query Family Query DSL: { text => { title => { query => ' perl is gr ', type => ' phrase_prefix ', }}} Search Builder: { title => { '^' => { query => ' perl is gr ', }}}
  163. 163. Query string / Field Lucene Query Syntax aware “ perl is great”~5 AND author:clint* -deleted
  164. 164. Query string / Field Syntax errors: AND perl is great ” author : clint* -
  165. 165. Query string / Field Syntax errors: AND perl is great ” author : clint* - ElasticSearch::QueryParser
  166. 166. Combining: Bool Query DSL: { bool => { must => [ { term => { foo => 1}}, ... ], must_not => [ { term => { bar => 1}}, ... ], should => [ { term => { X => 2}}, { term => { Y => 2}},... ], minimum_number_should_match => 1, }}
  167. 167. Combining: Bool SearchBuilder: { foo => 1, bar => { '!=' => 1}, -or => [ X => 2, Y => 2], } { -bool => { must => { foo => 1 }, must_not => { bar => 1 }, should => [{ X => 2}, { Y => 2 }], minimum_number_should_match => 1, }}
  168. 168. Combining: DisMax Query DSL: { dis_max => { queries => [ { term => { foo => 1}}, { term => { bar => 1}}, ] }} SearchBuilder: { -dis_max => [ { term => { foo => 1}}, { term => { bar => 1}}, ], }
  169. 169. Bool: combines scores DisMax: uses highest score from all matching clauses
  170. 170. Tweaking relevance:
  171. 171. Tweaking relevance: Boosting
  172. 172. Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string” }, }
  173. 173. Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string”, boost => 2, }, }, }
  174. 174. Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string”, boost => 2, }, rank => { type => “integer” }, }, _boost => { name => 'rank', null_value => 1.0 }, }
  175. 175. Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => 'perl' }}, ] }} SearchBuilder: { content => 'perl', title => 'perl' }
  176. 176. Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => { query => 'perl', }}, ] }} SearchBuilder: { content => 'perl', title => { '=' => { query => 'perl' }} }
  177. 177. Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => { query => 'perl', boost => 2 }}, ] }} SearchBuilder: { content => 'perl', title => { '=' => { query => 'perl', boost=> 2 }} }
  178. 178. Boosting: custom_score Query DSL: { custom_score => { query => { text => { title => 'perl' }}, script => “_score * foo /doc['rank'].value”, }} SearchBuilder: { -custom_score => { query => { title => 'perl' }, script => “_score * foo /doc['rank'].value”, }}
  179. 179. Query example SearchBuilder: { -or => [ title => { '=' => { query => 'custom score', boost => 2 }}, content => 'custom score', ], -filter => { repo => 'elasticsearch/elasticsearch', created_at => { '>=' => '2011-07-01', '<' => '2011-08-01'}, -or => [ creator_id => 123, assignee_id => 123, ], labels => ['bug','breaking'] } }
  180. 180. Query example Query DSL: { query => { filtered => { query => { bool => { should => [ { text => { content => &quot;custom score&quot; } }, { text => { title => { boost => 2, query => &quot;custom score&quot; } } }, ], }, }, filter => { and => [ { or => [ { term => { creator_id => 123 } }, { term => { assignee_id => 123 } }, ]}, { terms => { labels => [&quot;bug&quot;, &quot;breaking&quot;] } }, { term => { repo => &quot;elasticsearch/elasticsearch&quot; } }, { numeric_range => { created_at => { gte => &quot;2011-07-01&quot;, lt => &quot;2011-08-01&quot; }}}, ]}, }}
  181. 182. https://github.com/clintongormley/GitHubSearch

×