SIMPLE SEARCH WITH
  ELASTIC SEARCH
      MARK STORY
       @MARK_STORY
WAT?
Java based
Lucene powered
JSON driven
Document orientated database
All out super search solution
Easy to setup, and use
INDEXES AND TYPES
            INDEXES
   Similar concept to databases.
   Contain multiple types.
TYPES
Similar concept to tables.
Defines datatypes and indexing rules.
DOCUMENT BASED REST API
     Simple to use and easy to understand.
CREATE A DOCUMENT
cr -PS lclot90/otcspol - '
 ul XOT oahs:20cnat/epe d {
  "ae:"akSoy,
   nm" Mr tr"
  "mi" "akmr-tr.o"
   eal: mr@aksoycm,
  "wte" "mr_tr"
   titr: @aksoy,
  "onr" "aaa,
   cuty: Cnd"
  "as:[ckpp,"aeet,"aaa]
   tg" "aeh" ckfs" cnd"
}'

#Rsos:
  epne
{o"tu,
 "k:re
"idx:cnat"
 _ne""otcs,
"tp""epe,
 _ye:pol"
"i""iMaiBDAWs5g,
 _d:9zCSQq1J8i7"
"vrin:}
 _eso"1
READ IT BACK
cr -GTlclot90/otcspol/i?rtytu
 ul XE oahs:20cnat/epe$dpet=re

#Rsos:
   epne
{_ne""otcs,
 "idx:cnat"
"tp""epe,
 _ye:pol"
"i""iMaiBDAWs5g,
 _d:9zCSQq1J8i7"
"vrin:,
 _eso"1
"xss:re
 eit"tu,
"suc":{
 _ore
  "ae:"akSoy,
   nm" Mr tr"
  "mi" "akmr-tr.o"
   eal: mr@aksoycm,
  "wte" "mr_tr"
   titr: @aksoy,
  "onr" "aaa,
   cuty: Cnd"
  "as:[ckpp,"aeet,"aaa]
   tg" "aeh" ckfs" cnd"
}}
DELETE IT!
cr -DLT lclot90/otcspol/i
 ul XEEE oahs:20cnat/epe$d

#Rsos:
  epne
{o"tu,
 "k:re
"on"tu,
 fud:re
"idx:cnat"
 _ne""otcs,
"tp""epe,
 _ye:pol"
"i""iMaiBDAWs5g,
 _d:9zCSQq1J8i7"
"vrin:}
 _eso"2
SIMPLE SEARCH!
               More on search to come




cr -GTlclot90/otcspol/sac?=akpet=re
 ul XE oahs:20cnat/epe_erhqMr&rtytu
#Rsos:
   epne
{to"1,
 "ok:4
"ie_u"fle
 tmdot:as,
"sad"{
 _hrs:
  "oa"5
   ttl:,
  "ucsfl:,
   scesu"5
  "ald:
   fie"0
},
"is:
 ht"{
  "oa"1
   ttl:,
  "a_cr"01746,
   mxsoe:.1424
  "is:
   ht"[
    {_ne""otcs,
    "idx:cnat"
     "tp""epe,
      _ye:pol"
     "i""llMTWqMZUlw,
      _d:sJyBSaAfU-D"
     "soe:.1424
      _cr"01746,
     "suc":{
      _ore
       "ae:"akSoy,
        nm" Mr tr"
       "mi" "akmr-tr.o"
        eal: mr@aksoycm,
       "wte" "mr_tr"
        titr: @aksoy,
       "onr" "aaa,
        cuty: Cnd"
       "as:[ckpp,"aeet,"aaa]
        tg" "aeh" ckfs" cnd"
     }
    }
  ]
}}
THIS ALL SOUNDS TOO
 BADASS TO BE TRUE
DOCUMENT "DATABASE" IS A
      BIT LIMITED
     Partial updates are doable but painful
     No joins
     No map reduce
     Cannot replace all other datasources
BUT SEARCH IS AMAZZZING
SEARCH BETWEEN TYPES &
       INDEXES
                Search multiple types
cr -GTlclot90/otcspol,opne/sac?=aeMr
 ul XE oahs:20cnat/epecmais_erhqnm:ak


        Search multiple indexes in your cluster
cr -GTlclot90/alpol/sac?=aeMr
 ul XE oahs:20_l/epe_erhqnm:ak
FANCY SEARCH OPTIONS
SEARCH WITH TEXT
       EXPRESSIONS
cr -GTlclot90/otcspol/sac?rtytu - '
 ul XE oahs:20cnat/epe_erhpet=re d {
  "ur" {
   qey:
    "ur_tig:{
     qeysrn"
      "ur" "akO wlo"
       qey: mr R edn
    }
  }
}'
HIGHLIGHT SEARCH
             KEYWORDS
Wrap search terms in highlighting text/markup/html. Great for larger
            documents, as you can extract fragments.
cr -GTlclot90/otcspol/sac?rtytu - '
 ul XE oahs:20cnat/epe_erhpet=re d {
   "ur" {
    qey:
      "et:{
       tx"
         "mi" "ak
          eal: mr"
      }
   },
   "ihih" {
    hglgt:
      "ils:{
       fed"
         "mi" {,
          eal: }
         "ae:{
          nm" }
      }
   }
}'
FACETS
Facets provide aggregated data about a query. You can use this data
          to create drill down search, or histogram data.
         Term counts.
         Custom script values.
         Ranges - like price ranges.
         Geo distance facets - aggregate results by distance.
cr -GTlclot90/otcspol/sac?rtytu - '
 ul XE oahs:20cnat/epe_erhpet=re d {
   "ur" {
    qey:
      "ur_tig:{
       qeysrn"
         "ur" ".o"
          qey: *cm
      }
   },
   "aes:{
    fct"
      "agd:{trs:{fed:"as}}
       tge" "em"   "il" tg"
   }
}'
KNOBS & BUTTONS
MAPPINGS
Allows fine-grained searching later on, and lets you configure
custom mappings.
Control the data types, and indexing used for JSON document
types.
Disable indexing on specific fields.
Configure custom analyzers. For example, non-english stemming.
AVAILABLE MAPPING TYPES
string, integer, float, boolean, null
object - Standard type for nested objects. Allows
Arrays are automatically handled as the above.
properties to be defined.
multi_field - Allows a field to be handled multiple ways with different
aliases.
nested - Indexes sub objects, and works with nested filter/queries.
ip - For ipv4 data.
geo_point - For lat/lon values. Enables piles of search options.
attachment - Store a blob. Can index many text based documents
like PDF.
CREATE A MAPPING
cr -PTlclot90/otcspol/mpig- '
 ul XU    oahs:20cnat/epe_apn d {
  "epe:{
   pol"
    "rpris:{
     poete"
      "ae:{tp" "tig}
       nm" "ye: srn",
      "mi" {tp" "tig}
       eal: "ye: srn",
      "wte" {tp" "tig}
       titr: "ye: srn",
      "onr" {tp" "tig}
       cuty: "ye: srn",
      "as:{tp" "tig}
       tg" "ye: srn"
    }
  }
}'
DEFINE THE ANALYZER USED
When defining a field you can use a a y e i d x a a y e ,
                                    nlzr ne_nlzr
and s a c _ n l z r customize the way data is stored, and or
     e r h a a y e to
searched.
You can also disable analyzing for specific fields.
DISABLE INDEXING
{
    "ae:{
     nm"
      "ye:"tig,
       tp" srn"
      "ne" "o_nlzd,
       idx: ntaaye"
    },
    "oe:{
     nn"
      "ye:"nee"
       tp" itgr,
      "ne" "o
       idx: n"
    }
}
SHARDS & REPLICAS
                              SHARDS
    Define how many nodes you want to split your data across.
       If a node goes down, you still have some of your data.
        You can use routing to control how data is sharded.
More shards improves indexing performance, as work is distributed.
SIMPLE SHARDING
SHARD OVER MULTIPLE NODES
REPLICAS
          Define how many copies of your data you want.
    If several nodes go down, you might still have all your data.
More replicas improves search performance and cluster availability.
REPLICAS
MULTI-TENANCY
Multi-tenancy is a reasonably common requirement, and there are a
                         few ways to do it.
ONE INDEX PER 'TENANT'
Great for small number of tenants.
Painful for larger number of tenants. As sharding and replicas can
be harder to manage.
cr -GTlclot90/akcnat/sac?rtytu - '
 ul XE oahs:20mr/otcs_erhpet=re d {
  "ur" {
   qey:
    "ur_tig:{
     qeysrn"
      "ur" "ednO js"
       qey: wlo R oe
    }
  }
}'
SPECIAL FILTER CONDITIONS
More error prone as you have to include a filter condition.
Easy to shard and setup replicas.
Easily scales to many tenants. As shards/replicas are shared.
Make sure tenant id is a non-analyzed value.
cr -GTlclot90/conigivie/sac?rtytu - '
 ul XE oahs:20acutn/nocs_erhpet=re d {
  "ur" {
   qey:
    "itrd:{
     flee"
      "itr:{
       fle"
        "em:{acutd:1
         tr" "coni"   }
      },
      "ur" {
       qey:
        "ur_tig:{
         qeysrn"
          "ur" "upewf"
           qey: prl ii
        }
      }
    }
  }
}'
OTHER BATTERIES INCLUDED
        Routing Define how documents are sharded.
  Rivers Pipe data in realtime from sources like RabbitMQ.
             Thrift Talk thirft to ElasticSearch.
INTEGRATION WITH
    CAKEPHP
HTTPSOCKET + JSON_ENCODE()
      Basic, can be hard to use.
      No magic.
ELASTICSEARCH DATASOURCE
          (David Kullman)
 Behavior to auto index on aftersave
 Datasource for searching elasticsearch
 Console app to index models
ELASTICSEARCH PLUGIN
        (Kevin von Zonneveld)
Similar features to the previous plugin
Offers more control on how data is indexed
QUESTIONS?
Simple search with elastic search

Simple search with elastic search

  • 1.
    SIMPLE SEARCH WITH ELASTIC SEARCH MARK STORY @MARK_STORY
  • 2.
    WAT? Java based Lucene powered JSONdriven Document orientated database All out super search solution Easy to setup, and use
  • 3.
    INDEXES AND TYPES INDEXES Similar concept to databases. Contain multiple types.
  • 4.
    TYPES Similar concept totables. Defines datatypes and indexing rules.
  • 5.
    DOCUMENT BASED RESTAPI Simple to use and easy to understand.
  • 6.
    CREATE A DOCUMENT cr-PS lclot90/otcspol - ' ul XOT oahs:20cnat/epe d { "ae:"akSoy, nm" Mr tr" "mi" "akmr-tr.o" eal: mr@aksoycm, "wte" "mr_tr" titr: @aksoy, "onr" "aaa, cuty: Cnd" "as:[ckpp,"aeet,"aaa] tg" "aeh" ckfs" cnd" }' #Rsos: epne {o"tu, "k:re "idx:cnat" _ne""otcs, "tp""epe, _ye:pol" "i""iMaiBDAWs5g, _d:9zCSQq1J8i7" "vrin:} _eso"1
  • 7.
    READ IT BACK cr-GTlclot90/otcspol/i?rtytu ul XE oahs:20cnat/epe$dpet=re #Rsos: epne {_ne""otcs, "idx:cnat" "tp""epe, _ye:pol" "i""iMaiBDAWs5g, _d:9zCSQq1J8i7" "vrin:, _eso"1 "xss:re eit"tu, "suc":{ _ore "ae:"akSoy, nm" Mr tr" "mi" "akmr-tr.o" eal: mr@aksoycm, "wte" "mr_tr" titr: @aksoy, "onr" "aaa, cuty: Cnd" "as:[ckpp,"aeet,"aaa] tg" "aeh" ckfs" cnd" }}
  • 8.
    DELETE IT! cr -DLTlclot90/otcspol/i ul XEEE oahs:20cnat/epe$d #Rsos: epne {o"tu, "k:re "on"tu, fud:re "idx:cnat" _ne""otcs, "tp""epe, _ye:pol" "i""iMaiBDAWs5g, _d:9zCSQq1J8i7" "vrin:} _eso"2
  • 9.
    SIMPLE SEARCH! More on search to come cr -GTlclot90/otcspol/sac?=akpet=re ul XE oahs:20cnat/epe_erhqMr&rtytu
  • 10.
    #Rsos: epne {to"1, "ok:4 "ie_u"fle tmdot:as, "sad"{ _hrs: "oa"5 ttl:, "ucsfl:, scesu"5 "ald: fie"0 }, "is: ht"{ "oa"1 ttl:, "a_cr"01746, mxsoe:.1424 "is: ht"[ {_ne""otcs, "idx:cnat" "tp""epe, _ye:pol" "i""llMTWqMZUlw, _d:sJyBSaAfU-D" "soe:.1424 _cr"01746, "suc":{ _ore "ae:"akSoy, nm" Mr tr" "mi" "akmr-tr.o" eal: mr@aksoycm, "wte" "mr_tr" titr: @aksoy, "onr" "aaa, cuty: Cnd" "as:[ckpp,"aeet,"aaa] tg" "aeh" ckfs" cnd" } } ] }}
  • 11.
    THIS ALL SOUNDSTOO BADASS TO BE TRUE
  • 12.
    DOCUMENT "DATABASE" ISA BIT LIMITED Partial updates are doable but painful No joins No map reduce Cannot replace all other datasources
  • 13.
    BUT SEARCH ISAMAZZZING
  • 14.
    SEARCH BETWEEN TYPES& INDEXES Search multiple types cr -GTlclot90/otcspol,opne/sac?=aeMr ul XE oahs:20cnat/epecmais_erhqnm:ak Search multiple indexes in your cluster cr -GTlclot90/alpol/sac?=aeMr ul XE oahs:20_l/epe_erhqnm:ak
  • 15.
  • 16.
    SEARCH WITH TEXT EXPRESSIONS cr -GTlclot90/otcspol/sac?rtytu - ' ul XE oahs:20cnat/epe_erhpet=re d { "ur" { qey: "ur_tig:{ qeysrn" "ur" "akO wlo" qey: mr R edn } } }'
  • 17.
    HIGHLIGHT SEARCH KEYWORDS Wrap search terms in highlighting text/markup/html. Great for larger documents, as you can extract fragments.
  • 18.
    cr -GTlclot90/otcspol/sac?rtytu -' ul XE oahs:20cnat/epe_erhpet=re d { "ur" { qey: "et:{ tx" "mi" "ak eal: mr" } }, "ihih" { hglgt: "ils:{ fed" "mi" {, eal: } "ae:{ nm" } } } }'
  • 19.
    FACETS Facets provide aggregateddata about a query. You can use this data to create drill down search, or histogram data. Term counts. Custom script values. Ranges - like price ranges. Geo distance facets - aggregate results by distance.
  • 20.
    cr -GTlclot90/otcspol/sac?rtytu -' ul XE oahs:20cnat/epe_erhpet=re d { "ur" { qey: "ur_tig:{ qeysrn" "ur" ".o" qey: *cm } }, "aes:{ fct" "agd:{trs:{fed:"as}} tge" "em" "il" tg" } }'
  • 21.
  • 22.
    MAPPINGS Allows fine-grained searchinglater on, and lets you configure custom mappings. Control the data types, and indexing used for JSON document types. Disable indexing on specific fields. Configure custom analyzers. For example, non-english stemming.
  • 23.
    AVAILABLE MAPPING TYPES string,integer, float, boolean, null object - Standard type for nested objects. Allows Arrays are automatically handled as the above. properties to be defined.
  • 24.
    multi_field - Allowsa field to be handled multiple ways with different aliases. nested - Indexes sub objects, and works with nested filter/queries. ip - For ipv4 data. geo_point - For lat/lon values. Enables piles of search options. attachment - Store a blob. Can index many text based documents like PDF.
  • 25.
    CREATE A MAPPING cr-PTlclot90/otcspol/mpig- ' ul XU oahs:20cnat/epe_apn d { "epe:{ pol" "rpris:{ poete" "ae:{tp" "tig} nm" "ye: srn", "mi" {tp" "tig} eal: "ye: srn", "wte" {tp" "tig} titr: "ye: srn", "onr" {tp" "tig} cuty: "ye: srn", "as:{tp" "tig} tg" "ye: srn" } } }'
  • 26.
    DEFINE THE ANALYZERUSED When defining a field you can use a a y e i d x a a y e , nlzr ne_nlzr and s a c _ n l z r customize the way data is stored, and or e r h a a y e to searched. You can also disable analyzing for specific fields.
  • 27.
    DISABLE INDEXING { "ae:{ nm" "ye:"tig, tp" srn" "ne" "o_nlzd, idx: ntaaye" }, "oe:{ nn" "ye:"nee" tp" itgr, "ne" "o idx: n" } }
  • 28.
    SHARDS & REPLICAS SHARDS Define how many nodes you want to split your data across. If a node goes down, you still have some of your data. You can use routing to control how data is sharded. More shards improves indexing performance, as work is distributed.
  • 29.
  • 30.
  • 31.
    REPLICAS Define how many copies of your data you want. If several nodes go down, you might still have all your data. More replicas improves search performance and cluster availability.
  • 32.
  • 33.
    MULTI-TENANCY Multi-tenancy is areasonably common requirement, and there are a few ways to do it.
  • 34.
    ONE INDEX PER'TENANT' Great for small number of tenants. Painful for larger number of tenants. As sharding and replicas can be harder to manage.
  • 35.
    cr -GTlclot90/akcnat/sac?rtytu -' ul XE oahs:20mr/otcs_erhpet=re d { "ur" { qey: "ur_tig:{ qeysrn" "ur" "ednO js" qey: wlo R oe } } }'
  • 36.
    SPECIAL FILTER CONDITIONS Moreerror prone as you have to include a filter condition. Easy to shard and setup replicas. Easily scales to many tenants. As shards/replicas are shared. Make sure tenant id is a non-analyzed value.
  • 37.
    cr -GTlclot90/conigivie/sac?rtytu -' ul XE oahs:20acutn/nocs_erhpet=re d { "ur" { qey: "itrd:{ flee" "itr:{ fle" "em:{acutd:1 tr" "coni" } }, "ur" { qey: "ur_tig:{ qeysrn" "ur" "upewf" qey: prl ii } } } } }'
  • 38.
    OTHER BATTERIES INCLUDED Routing Define how documents are sharded. Rivers Pipe data in realtime from sources like RabbitMQ. Thrift Talk thirft to ElasticSearch.
  • 39.
  • 40.
    HTTPSOCKET + JSON_ENCODE() Basic, can be hard to use. No magic.
  • 41.
    ELASTICSEARCH DATASOURCE (David Kullman) Behavior to auto index on aftersave Datasource for searching elasticsearch Console app to index models
  • 42.
    ELASTICSEARCH PLUGIN (Kevin von Zonneveld) Similar features to the previous plugin Offers more control on how data is indexed
  • 43.