Real-time search in Drupal.
Meet Elasticsearch
By Alexei Gorobets
asgorobets
Elasticsearch
Flexible and powerful open
source, distributed real-time
search and analytics engine
for the cloud
Why use
Elasticsearch?
● RESTful API
● Open Source
● JSON over HTTP
● based on Lucene
● distributed
● highly available
● schema free
● massively scalable
Setup in 2 steps:
1. Extract the archive
2. > bin/elasticsearch
How to use it?
> curl -XGET localhost:9200/?pretty
> curl -XGET localhost:9200/?pretty
{
"ok" : true,
"status" : 200,
"name" : "Infinity",
"version" : {
"number" : "0.90.1",
"snapshot_build" : false,
"lucene_version" : "4.3"
},
"tagline" : "You Know, for Search"
}
> curl -XGET localhost:9200/?pretty
action (verb)
> curl -XGET localhost:9200/?pretty
node + port
> curl -XGET localhost:9200/?pretty
path
> curl -XGET localhost:9200/?pretty
query string
Let's index some data
> PUT /index/type/id
Where?
It's very similar to
database in SQL
> PUT /index/type/id
What?
Table
Content type,
Entity type,
any kind of type you decide
> PUT /index/type/id
Which?
Node ID,
Entity ID,
any kind of serial ID
> PUT /mysite/node/1 -d
{
"nid": "1",
"status": "1",
"title": "Hello elasticsearch",
"body": "First elasticsearch document"
}
> PUT /mysite/node/1 -d
{
"nid": "1",
"status": "1",
"title": "Hello elasticsearch",
"body": "First elasticsearch document"
}
{
"ok":true,
"_index":"mysite",
"_type":"node",
"_id":"1",
"_version":1
}
Let's GET some data
> GET /mysite/node/1
{
"_index" : "mysite",
"_type" : "node",
"_id" : "1",
"_version" : 1,
"exists" : true,
"_source" : {
"nid":"1",
"status":"1",
"title":"Hello elasticsearch",
"body":"First elasticsearch document"
}
> GET /mysite/node/1?fields=title,body
Get specific fields
> GET /mysite/node/1?fields=title,body
Get specific fields
> GET /mysite/node/1/_source
Get source only
Let's UPDATE some data
> PUT /mysite/node/1 -d
{
"status":"0"
}
> PUT /mysite/node/1 -d
{
"ok":true,
"_index":"mysite",
"_type":"node",
"_id":"1",
"_version":2
}
{
"status":"0"
}
UPDATE = DELETE + PUT
Let's DELETE some data
> DELETE /mysite/node/1
> DELETE /mysite/node/1
{
"ok":true,
"found":true,
"_index":"mysite",
"_type":"node",
"_id":"1",
"_version":3
}
Distributed, Highly Available
> PUT /new_index -d '{
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
}
}'
Concurrency, Version control
> PUT /myapp/node/1?version=1
{
"title": "hi girl"
}
> PUT /myapp/node/1?version=1
{
"title": "hi girl"
}
{
"_index": "myapp",
"_type": "node",
"_id": "1",
"_version": 1,
"created": false
}
> PUT /myapp/node/1?version=1
{
"title": "hey boy"
}
# 200
> PUT /myapp/node/1?version=1
{
"title": "hey boy"
}
# 409
> version conflict, current [2], provided [1]
Let's SEARCH for something
> GET /_search
> GET /_search
{
"took" : 32,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"failed" : 0
},
"hits" : { results... }
}
Let's SEARCH in multiple
indices and types
> GET /index/_search
> GET /index/type/_search
> GET /index1,index2/_search
> GET /myapp_*/type, entity_*/_search
Let's PAGINATE results
> GET /_search?size=10&from=20
size = results per page
from = starting from
Let's search oldschool
> GET /_search?q=title:elasticsearch
> GET /_search?q=nid:60
+title:awesome
+status:1
+created:[1369917354 TO *]
?q=title:awesome%20%2Bcreated:
[1369917354%20TO%20*]%2Bstatus:1
+title:awesome
+status:1
+created:[1369917354 TO *]
The ugly encoding =)
Query DSL style
> GET /_search -d
{
"query": {
"match": "awesome"
}
}
> GET /_search -d
{
"query": {
"match" : {
"title" : {
"query" : "+awesome -poor",
"boost" : 2.0,
}
}
}
}
Mappings and types
Core types
* string
* number
* date
* boolean
Complex types
* array type
* object type
* nested type
Others:
ip type
geo point
geo shape
attachments
Define type mapping
> PUT /myapp/node -d
{
"node" : {
"properties" : {
"message" : {
"type" : "string",
"store" : true
}
}
}
}
Indexed fields
Full text
analyzed
== is splitted into
terms
Term
not analyzed
== is stored as is
> PUT /myapp/node -d
{
"node" : {
"properties" : {
"name" : {
"type" : "string",
"store" : true,
“index”: “not_analyzed”
}
}
}
}
Dynamic mapping
Analysis and indexing
Inverted index
1. “The quick brown fox
jumped over the lazy
dog”
2. “Quick brown foxes
leap over lazy dogs in
summer”
Term Doc_1 Doc_2
-------------------------
Quick | | X
The | X |
brown | X | X
dog | X |
dogs | | X
fox | X |
foxes | | X
in | | X
jumped | X |
lazy | X | X
leap | | X
over | X | X
quick | X |
summer | | X
the | X |
Analyzer
Tokenizers
● standard
● keyword
● whitespace
● ngram
TokenFilters
standard
lowercase
stop
truncate
snowball
> GET /_analyze?analyzer=standard -d
'this is a test baby'
{
"tokens" : [ {
"token" : "test",
"start_offset" : 10,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 4
}, {
"token" : "baby",
"start_offset" : 15,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 5
} ]
}
Autocomplete fields
Queries & Filters
Queries & Filters
full text search
relevance score
heavy
not cacheable
exact match
show or hide
lightning fast
cacheable
Combine Filters & Queries
> GET /_search -d
{
"query": {
"filtered": {
"query": {
"match": { "title": "awesome" }
},
"filter": {
"term": { "type": "article" }
}
}
}
}
and Sorting
> GET /_search -d
{
"query": {
"filtered": {
"query": {
"match": { "title": "awesome" }
},
"filter": {
"term": { "type": "article" }
}
}
}
"sort": {"date":"desc"}
}
Relevance. Explain API
Term frequency
How often does the term appear in the field?
The more often, the more relevant.
Inverse document frequency
How often does each term appear in the
index? The more often, the less relevant. T
Field norm
How long is the field? The longer it is, the less
likely it is that words in the field will be
relevant.
and Facets
Facets on Amazon
> GET /_search -d
{
"facets": {
"home_team": {
"terms": {
"field": "field_home_team"
}
}
}
}
> GET /_search -d
{
"facets": {
"home_team": {
"terms": {
"field": "field_home_team"
}
}
}
}
Give your facet a name
> GET /_search -d
{
"facets": {
"home_team": {
"terms": {
"field": "field_home_team"
}
}
}
}
Your facet filter can be:
● Terms
● Range
● Histogram
● Date Histogram
● Filter
● Query
● Statistical
● Terms Stats
● Geo Distance
"facets" : {
"home_team" : {
"_type" : "terms",
"missing" : 203,
"total" : 100,
"other" : 42,
"terms" : [ {
"term" : "hou",
"count" : 8
}, {
"term" : "sln",
"count" : 6
}, ...
STOP! I want this in Drupal?
Available modules:
Elasticsearch
Elasticsearch Connector
Search API elasticsearch
Development directions:
1. Search API implementation
2. Field Storage API
3. Alternative backends
Available modules:
Elasticsearch
Elasticsearch Connector
Search API elasticsearch
Field Storage API implementation
Elasticsearch field storage sandbox by Damien Tournoud
Started in July 2011
Field Storage API implementation
Elasticsearch field storage sandbox by Damien Tournoud
Started in July 2011
Elasticsearch EntityFieldQuery sandbox
https://drupal.org/sandbox/asgorobets/2073151
Let's DEMO
Let the Search be with you
Real-time search in Drupal with Elasticsearch @Moldcamp

Real-time search in Drupal with Elasticsearch @Moldcamp