Terms of endearment - the ElasticSearch Query DSL explained
Upcoming SlideShare
Loading in...5
×
 

Terms of endearment - the ElasticSearch Query DSL explained

on

  • 54,280 views

An introduction to mapping, analyzers, and how to query ElasticSearch using the Perl API

An introduction to mapping, analyzers, and how to query ElasticSearch using the Perl API

Statistics

Views

Total Views
54,280
Views on SlideShare
27,721
Embed Views
26,559

Actions

Likes
95
Downloads
832
Comments
4

22 Embeds 26,559

http://www.elasticsearch.org 25950
http://www.elasticsearch.cn 259
http://es-cn.medcl.net 159
http://www.scoop.it 127
https://twitter.com 17
http://localhost 8
http://translate.googleusercontent.com 7
http://doc.elasticsearch.cn 5
http://us-w1.rockmelt.com 4
http://a0.twimg.com 4
http://webcache.googleusercontent.com 3
http://twitter.com 3
http://tweetedtimes.com 3
http://localhost:4000 2
http://pinterest.com 1
http://www.slashdocs.com 1
http://www.google.com&_=1359652646078 HTTP 1
http://s.medcl.net 1
http://192.168.15.121:4000 1
https://si0.twimg.com 1
http://10.100.0.241 1
https://www.linkedin.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • awesome ppt.. any clue where can i hear your ppt talk? Any further lilnk will be great....thanks :D
    Are you sure you want to
    Your message goes here
    Processing…
  • Awesome slide. I wish I had found this before I jumped into ElasticSearch. This should be included in the official documentation for ES since the current Guide/API Docs is seriously lacking these information, especially for newbies like myself.
    Are you sure you want to
    Your message goes here
    Processing…
  • When viewed on Chrome on 1/16/13, some of the slides do not display correctly. http://i.imgur.com/HXFCE.png (HXFCE.png file at imgur). However, if I download and view in Power Point, it's fine.
    Are you sure you want to
    Your message goes here
    Processing…
  • Great presentation!
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Terms of endearment - the ElasticSearch Query DSL explained Terms of endearment - the ElasticSearch Query DSL explained Presentation Transcript

  • “ Terms of Endearment” The ElasticSearch query language explained Clinton Gormley, YAPC::EU 2011 DRTECH @clintongormley
  • search for : “ DELETE QUERY ” We can
  • search for : “ DELETE QUERY ” and find : “ deleteByQuery ” We can
  • but you can only find what is stored in the database
  • Normalise values “ deleteByQuery” 'delete' 'by' 'query' 'deletebyquery'
  • Normalise values and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
  • Normalise values and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
  • Analyse values and search terms “ deleteByQuery” “ DELETE QUERY” ' delete ' 'by' ' query ' 'deletebyquery'
  • What is stored in ElasticSearch?
  • { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Document:
  • { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Fields:
  • { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Values:
  • { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org" }, tags => [" perl" ,"opinion"], posts => 2, } Field types: # object # string # date # nested object # string # string # array of enums # integer
  • { tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "drtech@cpan.org", }, tags => [" perl" ,"opinion"], posts => 2, } Nested objects flattened:
  • { tweet => "Perl is GREAT!", posted => "2011-08-15", user.name => "Clinton Gormley", user.email => "drtech@cpan.org", tags => [" perl" ,"opinion"], posts => 2, } Nested objects flattened
  • { tweet => "Perl is GREAT!", posted => "2011-08-15", user.name => "Clinton Gormley", user.email => "drtech@cpan.org", tags => [" perl" ,"opinion"], posts => 2, } Values analyzed into terms
  • { tweet => ['perl','great'], posted => [Date(2011-08-15)], user.name => ['clinton','gormley'], user.email => ['drtech','cpan.org'], tags => [' perl' ,'opinion'], posts => [2], } Values analyzed into terms
  • database table row ⇒ many tables ⇒ many rows ⇒ one schema ⇒ many columns In MySQL
  • index type document ⇒ many types ⇒ many documents ⇒ one mapping ⇒ many fields In ElasticSearch
  • Create index with mappings $es-> create_index ( index => 'twitter', mappings => { tweet => { properties => { title => { type => 'string' }, created => { type => 'date' } } } } );
  • Add a mapping $es-> put_mapping ( index => 'twitter', type => ' user ', mapping => { properties => { name => { type => 'string' }, created => { type => 'date' }, } } );
  • Can add to existing mapping
  • Can add to existing mapping Cannot change mapping for field
  • Core field types { type => 'string', }
  • Core field types { type => 'string', # byte|short|integer|long|double|float # date, ip addr, geolocation # boolean # binary (as base 64) }
  • Core field types { type => 'string', index => ' analyzed ', # 'Foo Bar' ⇒ [ 'foo', 'bar' ] }
  • Core field types { type => 'string', index => ' not_analyzed ', # 'Foo Bar' ⇒ [ 'Foo Bar' ] }
  • Core field types { type => 'string', index => ' no ', # 'Foo Bar' ⇒ [ ] }
  • Core field types { type => 'string', index => 'analyzed', analyzer => 'default', }
  • Core field types { type => 'string', index => 'analyzed', index_ analyzer => 'default', search_ analyzer => 'default', }
  • Core field types { type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, }
  • Core field types { type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, include_in_all => 1 |0 }
    • Standard
    • Simple
    • Whitespace
    • Stop
    • Keyword
    Built in analyzers
    • Pattern
    • Language
    • Snowball
    • Custom
  • The Brown-Cow's Part_No. #A.BC123-456 joe@bloggs.com keyword: The Brown-Cow's Part_No. #A.BC123-456 joe@bloggs.com whitespace: The, Brown-Cow's, Part_No., #A.BC123-456, joe@bloggs.com simple: the, brown, cow, s, part, no, a, bc, joe, bloggs, com standard: brown, cow's, part_no, a.bc123, 456, joe, bloggs.com snowball (English): brown, cow, part_no, a.bc123, 456, joe, bloggs.com
  • Token filters
    • Standard
    • ASCII Folding
    • Length
    • Lowercase
    • NGram
    • Edge NGram
    • Porter Stem
    • Shingle
    • Stop
    • Word Delimiter
    • Stemmer
    • KStem
    • Snowball
    • Phonetic
    • Synonym
    • Compound Word
    • Reverse
    • Elision
    • Truncate
    • Unique
  • Custom Analyzer $c->create_index( index => 'twitter', settings => { analysis => { analyzer => { ascii_html => { type => 'custom', tokenizer => 'standard', filter => [ qw( standard lowercase asciifolding stop ) ], char_filter => ['html_strip'] } } }} );
  • Searching $result = $es->search( index => 'twitter', type => 'tweet', );
  • Searching $result = $es->search( index => ['twitter','facebook'] , type => ['tweet','post'] , );
  • Searching $result = $es->search( # all indices # all types );
  • Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, );
  • Searching $result = $es->search( index => 'twitter', type => 'tweet', query b => 'foo' , # b == ElasticSearch::SearchBuilder );
  • Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] );
  • Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] from => 0, size => 10, );
  • Query DSL
  • Queries vs Filters
  • Queries vs Filters
    • full text & terms
    • terms only
  • Queries vs Filters
    • full text & terms
    • relevance scoring
    • terms only
    • no scoring
  • Queries vs Filters
    • full text & terms
    • relevance scoring
    • slower
    • terms only
    • no scoring
    • faster
  • Queries vs Filters
    • full text & terms
    • relevance scoring
    • slower
    • no caching
    • terms only
    • no scoring
    • faster
    • cacheable
  • Queries vs Filters
    • full text & terms
    • relevance scoring
    • slower
    • no caching
    • terms only
    • no scoring
    • faster
    • cacheable
    Use filters for anything that doesn't affect the relevance score!
  • Query only Query DSL: $es->search( query => { text => { title => 'perl' } } ); SearchBuilder: $es->search( query b => { title => 'perl' } );
  • Filter only Query DSL: $es->search( query => { constant_score => { filter => { term => { tag => 'perl } } } }); SearchBuilder: $es->search( query b => { -filter => { tag => 'perl' } });
  • Query and filter Query DSL: $es->search( query => { filtered => { query => { text => { title => 'perl' } }, filter =>{ term => { tag => 'perl' } } } }); SearchBuilder: $es->search( query b => { title => 'perl', -filter => { tag => 'perl' } });
  • Filters
  • Filters : equality Query DSL: { term => { tags => 'perl' }} { terms => { tags => ['perl','ruby'] }} SearchBuilder: { tags => 'perl' } { tags => ['perl','ruby'] }
  • Filters : range Query DSL: { range => { date => { gte => '2010-11-01', lt => '2010-12-01' }} SearchBuilder: { date => { gte => '2010-11-01', lt => '2011-12-01' }}
  • Filters : range (many values) Query DSL: { numeric_range => { date => { gte => '2010-11-01', lt => '2010-12-01 }} SearchBuilder: { date => { ' >= ' => '2010-11-01', ' < ' => '2011-12-01' }}
  • Filters : and | or | not Query DSL: { and => [ {term=>{X=>1}}, {term=>{Y=>2}} ]} { or => [ {term=>{X=>1}}, {term=>{Y=>2}} ]} { not => { or => [ {term=>{X=>1}}, {term=>{Y=>2}} ] }} SearchBuilder: { X => 1, Y => 2 } [ X => 1, Y => 2 ] { -not => { X => 1, Y => 2 } } # and { -not => [ X => 1, Y => 2 ] } # or
  • Filters : exists | missing Query DSL: { exists => { field => 'title' }} { missing => { field => 'title' }} SearchBuilder: { -exists => 'title' } { -missing => 'title' }
  • Filter example SearchBuilder: { -filter => [ featured => 1, { created_at => { gt => '2011-08-01' }, status => { '!=' => 'pending' }, }, ] }
  • Filter example Query DSL: { constant_score => { filter => { or => [ { term => { featured => 1 }}, { and => [ { not => { term => { status => 'pending' }}, { range => { created_at => { gt => '2011-08-01' }}}, ] } ] } } }
  • Filters : others
    • script
    • nested
    • has_child
    • query
    • match_all
    • prefix
    • limit
    • ids
    • type
    • geo_distance
    • geo_distance_range
    • geo_bbox
    • geo_polygon
  • Text / Analyzed:
    • text
    • query_string / field
    • flt / flt_field
    • mlt / mlt_field
    Term / Not analyzed:
    • term / terms
    • range
    • prefix
    • fuzzy
    • wildcard
    • ids
    • span queries
    Combining:
    • bool
    • dis_max
    • boosting
    Scripting:
    • custom_score
    • custom_filters_score
    Wrappers:
    • match_all
    • constant_score
    • filtered
    “ Joins”:
    • nested
    • has_child
    • top_children
    Queries
  • Text / Analyzed:
    • text
    • query_string / field
    • flt / flt_field
    • mlt / mlt_field
    Term / Not analyzed:
    • term / terms
    • range
    • prefix
    • fuzzy
    • wildcard
    • ids
    • span queries
    Combining:
    • bool
    • dis_max
    • boosting
    Scripting:
    • custom_score
    • custom_filters_score
    Wrappers:
    • match_all
    • constant_score
    • filtered
    “ Joins”:
    • nested
    • has_child
    • top_children
    Queries
  • Text/Analyzed Queries mapping aware
  • Text/Analyzed Queries not_analyzed ⇒ term query
  • Text/Analyzed Queries analyzed ⇒ text query using search_analyzer
  • Text-Query Family Query DSL: { text => { title => 'great perl' }} Search Builder: { title => 'great perl' }
  • Text-Query Family Query DSL: { text => { title => { query => 'great perl' }}} Search Builder: { title => { '=' => { query => 'great perl' }}}
  • Text-Query Family Query DSL: { text => { title => { query => 'great perl' , operator => 'and' }}} Search Builder: { title => { '=' => { query => 'great perl', operator => 'and' }}}
  • Text-Query Family Query DSL: { text => { title => { query => 'great perl' , fuzziness => 0.5 }}} Search Builder: { title => { '=' => { query => 'great perl', fuzziness => 0.5 }}}
  • Text-Query Family Query DSL: { text => { title => { query => 'great perl', type => 'phrase' }}} Search Builder: { title => { '==' => { query => 'great perl', }}}
  • Text-Query Family Query DSL: { text => { title => { query => ' great perl ', type => 'phrase' }}} Search Builder: { title => { '==' => { query => ' great perl ', }}}
  • Text-Query Family Query DSL: { text => { title => { query => ' perl is great ', type => 'phrase' }}} Search Builder: { title => { '==' => { query => ' perl is great ', }}}
  • Text-Query Family Query DSL: { text => { title => { query => ' perl great ', type => 'phrase', slop => 3 }}} Search Builder: { title => { '==' => { query => ' perl great ', slop => 3 }}}
  • Text-Query Family Query DSL: { text => { title => { query => ' perl is gr ', type => ' phrase_prefix ', }}} Search Builder: { title => { '^' => { query => ' perl is gr ', }}}
  • Query string / Field Lucene Query Syntax aware “ perl is great”~5 AND author:clint* -deleted
  • Query string / Field Syntax errors: AND perl is great ” author : clint* -
  • Query string / Field Syntax errors: AND perl is great ” author : clint* - ElasticSearch::QueryParser
  • Combining: Bool Query DSL: { bool => { must => [ { term => { foo => 1}}, ... ], must_not => [ { term => { bar => 1}}, ... ], should => [ { term => { X => 2}}, { term => { Y => 2}},... ], minimum_number_should_match => 1, }}
  • Combining: Bool SearchBuilder: { foo => 1, bar => { '!=' => 1}, -or => [ X => 2, Y => 2], } { -bool => { must => { foo => 1 }, must_not => { bar => 1 }, should => [{ X => 2}, { Y => 2 }], minimum_number_should_match => 1, }}
  • Combining: DisMax Query DSL: { dis_max => { queries => [ { term => { foo => 1}}, { term => { bar => 1}}, ] }} SearchBuilder: { -dis_max => [ { term => { foo => 1}}, { term => { bar => 1}}, ], }
  • Bool: combines scores DisMax: uses highest score from all matching clauses
  • Tweaking relevance:
  • Tweaking relevance: Boosting
  • Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string” }, }
  • Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string”, boost => 2, }, }, }
  • Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string”, boost => 2, }, rank => { type => “integer” }, }, _boost => { name => 'rank', null_value => 1.0 }, }
  • Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => 'perl' }}, ] }} SearchBuilder: { content => 'perl', title => 'perl' }
  • Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => { query => 'perl', }}, ] }} SearchBuilder: { content => 'perl', title => { '=' => { query => 'perl' }} }
  • Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => { query => 'perl', boost => 2 }}, ] }} SearchBuilder: { content => 'perl', title => { '=' => { query => 'perl', boost=> 2 }} }
  • Boosting: custom_score Query DSL: { custom_score => { query => { text => { title => 'perl' }}, script => “_score * foo /doc['rank'].value”, }} SearchBuilder: { -custom_score => { query => { title => 'perl' }, script => “_score * foo /doc['rank'].value”, }}
  • Query example SearchBuilder: { -or => [ title => { '=' => { query => 'custom score', boost => 2 }}, content => 'custom score', ], -filter => { repo => 'elasticsearch/elasticsearch', created_at => { '>=' => '2011-07-01', '<' => '2011-08-01'}, -or => [ creator_id => 123, assignee_id => 123, ], labels => ['bug','breaking'] } }
  • Query example Query DSL: { query => { filtered => { query => { bool => { should => [ { text => { content => &quot;custom score&quot; } }, { text => { title => { boost => 2, query => &quot;custom score&quot; } } }, ], }, }, filter => { and => [ { or => [ { term => { creator_id => 123 } }, { term => { assignee_id => 123 } }, ]}, { terms => { labels => [&quot;bug&quot;, &quot;breaking&quot;] } }, { term => { repo => &quot;elasticsearch/elasticsearch&quot; } }, { numeric_range => { created_at => { gte => &quot;2011-07-01&quot;, lt => &quot;2011-08-01&quot; }}}, ]}, }}
  •  
  • https://github.com/clintongormley/GitHubSearch