Real-time search in Drupal with Elasticsearch @Moldcamp

Real-time search in Drupal.
Meet Elasticsearch
By Alexei Gorobets
asgorobets

Elasticsearch
Flexible and powerful open
source, distributed real-time
search and analytics engine
for the cloud

● RESTful API
● Open Source
● JSON over HTTP
● based on Lucene
● distributed
● highly available
● schema free
● massively scalable

Setup in 2 steps:
1. Extract the archive
2. > bin/elasticsearch

> curl -XGET localhost:9200/?pretty

{
"ok" : true,
"status" : 200,
"name" : "Infinity",
"version" : {
"number" : "0.90.1",
"snapshot_build" : false,
"lucene_version" : "4.3"
},
"tagline" : "You Know, for Search"
}

action (verb)

node + port

path

query string

> PUT /index/type/id
Where?
It's very similar to
database in SQL

What?
Table
Content type,
Entity type,
any kind of type you decide

Which?
Node ID,
Entity ID,
any kind of serial ID

> PUT /mysite/node/1 -d
{
"nid": "1",
"status": "1",
"title": "Hello elasticsearch",
"body": "First elasticsearch document"
}

{
"nid": "1",
"status": "1",
"title": "Hello elasticsearch",
"body": "First elasticsearch document"
}
{
"ok":true,
"_index":"mysite",
"_type":"node",
"_id":"1",
"_version":1
}

> GET /mysite/node/1
{
"_index" : "mysite",
"_type" : "node",
"_id" : "1",
"_version" : 1,
"exists" : true,
"_source" : {
"nid":"1",
"status":"1",
"title":"Hello elasticsearch",
"body":"First elasticsearch document"
}

> GET /mysite/node/1?fields=title,body
Get specific fields

> GET /mysite/node/1?fields=title,body
Get specific fields
> GET /mysite/node/1/_source
Get source only

{
"status":"0"
}

{
"ok":true,
"_index":"mysite",
"_type":"node",
"_id":"1",
"_version":2
}
{
"status":"0"
}

> DELETE /mysite/node/1
{
"ok":true,
"found":true,
"_index":"mysite",
"_type":"node",
"_id":"1",
"_version":3
}

> PUT /new_index -d '{
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
}
}'

> PUT /myapp/node/1?version=1
{
"title": "hi girl"
}

{
"title": "hi girl"
}
{
"_index": "myapp",
"_type": "node",
"_id": "1",
"_version": 1,
"created": false
}

{
"title": "hey boy"
}
# 200

{
"title": "hey boy"
}
# 409
> version conflict, current [2], provided [1]

> GET /_search
{
"took" : 32,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"failed" : 0
},
"hits" : { results... }
}

Let's SEARCH in multiple
indices and types

> GET /index/_search
> GET /index/type/_search
> GET /index1,index2/_search
> GET /myapp_*/type, entity_*/_search

> GET /_search?size=10&from=20
size = results per page
from = starting from

> GET /_search?q=title:elasticsearch
> GET /_search?q=nid:60

+title:awesome
+status:1
+created:[1369917354 TO *]

?q=title:awesome%20%2Bcreated:
[1369917354%20TO%20*]%2Bstatus:1
+title:awesome
+status:1
+created:[1369917354 TO *]
The ugly encoding =)

> GET /_search -d
{
"query": {
"match": "awesome"
}
}

> GET /_search -d
{
"query": {
"match" : {
"title" : {
"query" : "+awesome -poor",
"boost" : 2.0,
}
}
}
}

Core types
* string
* number
* date
* boolean

Complex types
* array type
* object type
* nested type
Others:
ip type
geo point
geo shape
attachments

> PUT /myapp/node -d
{
"node" : {
"properties" : {
"message" : {
"type" : "string",
"store" : true
}
}
}
}

Full text
analyzed
== is splitted into
terms
Term
not analyzed
== is stored as is

> PUT /myapp/node -d
{
"node" : {
"properties" : {
"name" : {
"type" : "string",
"store" : true,
“index”: “not_analyzed”
}
}
}
}

Inverted index
1. “The quick brown fox
jumped over the lazy
dog”
2. “Quick brown foxes
leap over lazy dogs in
summer”
Term Doc_1 Doc_2
-------------------------
Quick | | X
The | X |
brown | X | X
dog | X |
dogs | | X
fox | X |
foxes | | X
in | | X
jumped | X |
lazy | X | X
leap | | X
over | X | X
quick | X |
summer | | X
the | X |

Analyzer
Tokenizers
● standard
● keyword
● whitespace
● ngram
TokenFilters
standard
lowercase
stop
truncate
snowball

> GET /_analyze?analyzer=standard -d
'this is a test baby'
{
"tokens" : [ {
"token" : "test",
"start_offset" : 10,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 4
}, {
"token" : "baby",
"start_offset" : 15,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 5
} ]
}

Queries & Filters
full text search
relevance score
heavy
not cacheable
exact match
show or hide
lightning fast
cacheable

> GET /_search -d
{
"query": {
"filtered": {
"query": {
"match": { "title": "awesome" }
},
"filter": {
"term": { "type": "article" }
}
}
}
}

> GET /_search -d
{
"query": {
"filtered": {
"query": {
"match": { "title": "awesome" }
},
"filter": {
"term": { "type": "article" }
}
}
}
"sort": {"date":"desc"}
}

Term frequency
How often does the term appear in the field?
The more often, the more relevant.
Inverse document frequency
How often does each term appear in the
index? The more often, the less relevant. T
Field norm
How long is the field? The longer it is, the less
likely it is that words in the field will be
relevant.

> GET /_search -d
{
"facets": {
"home_team": {
"terms": {
"field": "field_home_team"
}
}
}
}

> GET /_search -d
{
"facets": {
"home_team": {
"terms": {
}
}
}
}
Give your facet a name

> GET /_search -d
{
"facets": {
"home_team": {
"terms": {
}
}
}
}
Your facet filter can be:
● Terms
● Range
● Histogram
● Date Histogram
● Filter
● Query
● Statistical
● Terms Stats
● Geo Distance

"facets" : {
"home_team" : {
"_type" : "terms",
"missing" : 203,
"total" : 100,
"other" : 42,
"terms" : [ {
"term" : "hou",
"count" : 8
}, {
"term" : "sln",
"count" : 6
}, ...

Available modules:
Elasticsearch
Elasticsearch Connector
Search API elasticsearch

Development directions:
1. Search API implementation
2. Field Storage API
3. Alternative backends
Available modules:
Elasticsearch
Elasticsearch Connector
Search API elasticsearch

Field Storage API implementation
Elasticsearch field storage sandbox by Damien Tournoud
Started in July 2011

Field Storage API implementation
Elasticsearch field storage sandbox by Damien Tournoud
Started in July 2011
Elasticsearch EntityFieldQuery sandbox
https://drupal.org/sandbox/asgorobets/2073151

Real-time search in Drupal with Elasticsearch @Moldcamp

Real-time search in Drupal with Elasticsearch @Moldcamp

More Related Content

What's hot

Viewers also liked

Similar to Real-time search in Drupal with Elasticsearch @Moldcamp

Recently uploaded

Real-time search in Drupal with Elasticsearch @Moldcamp