Real-time search in Drupal with Elasticsearch @Moldcamp
Upcoming SlideShare
Loading in...5
×
 

Real-time search in Drupal with Elasticsearch @Moldcamp

on

  • 162 views

 

Statistics

Views

Total Views
162
Views on SlideShare
162
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Real-time search in Drupal with Elasticsearch @Moldcamp Real-time search in Drupal with Elasticsearch @Moldcamp Presentation Transcript

  • Real-time search in Drupal. Meet Elasticsearch By Alexei Gorobets asgorobets
  • Elasticsearch Flexible and powerful open source, distributed real-time search and analytics engine for the cloud
  • Why use Elasticsearch?
  • ● RESTful API ● Open Source ● JSON over HTTP ● based on Lucene ● distributed ● highly available ● schema free ● massively scalable
  • Setup in 2 steps: 1. Extract the archive 2. > bin/elasticsearch
  • How to use it?
  • > curl -XGET localhost:9200/?pretty
  • > curl -XGET localhost:9200/?pretty { "ok" : true, "status" : 200, "name" : "Infinity", "version" : { "number" : "0.90.1", "snapshot_build" : false, "lucene_version" : "4.3" }, "tagline" : "You Know, for Search" }
  • > curl -XGET localhost:9200/?pretty action (verb)
  • > curl -XGET localhost:9200/?pretty node + port
  • > curl -XGET localhost:9200/?pretty path
  • > curl -XGET localhost:9200/?pretty query string
  • Let's index some data
  • > PUT /index/type/id Where? It's very similar to database in SQL
  • > PUT /index/type/id What? Table Content type, Entity type, any kind of type you decide
  • > PUT /index/type/id Which? Node ID, Entity ID, any kind of serial ID
  • > PUT /mysite/node/1 -d { "nid": "1", "status": "1", "title": "Hello elasticsearch", "body": "First elasticsearch document" }
  • > PUT /mysite/node/1 -d { "nid": "1", "status": "1", "title": "Hello elasticsearch", "body": "First elasticsearch document" } { "ok":true, "_index":"mysite", "_type":"node", "_id":"1", "_version":1 }
  • Let's GET some data
  • > GET /mysite/node/1 { "_index" : "mysite", "_type" : "node", "_id" : "1", "_version" : 1, "exists" : true, "_source" : { "nid":"1", "status":"1", "title":"Hello elasticsearch", "body":"First elasticsearch document" }
  • > GET /mysite/node/1?fields=title,body Get specific fields
  • > GET /mysite/node/1?fields=title,body Get specific fields > GET /mysite/node/1/_source Get source only
  • Let's UPDATE some data
  • > PUT /mysite/node/1 -d { "status":"0" }
  • > PUT /mysite/node/1 -d { "ok":true, "_index":"mysite", "_type":"node", "_id":"1", "_version":2 } { "status":"0" }
  • UPDATE = DELETE + PUT
  • Let's DELETE some data
  • > DELETE /mysite/node/1
  • > DELETE /mysite/node/1 { "ok":true, "found":true, "_index":"mysite", "_type":"node", "_id":"1", "_version":3 }
  • Distributed, Highly Available
  • > PUT /new_index -d '{ "settings" : { "number_of_shards" : 3, "number_of_replicas" : 2 } }'
  • Concurrency, Version control
  • > PUT /myapp/node/1?version=1 { "title": "hi girl" }
  • > PUT /myapp/node/1?version=1 { "title": "hi girl" } { "_index": "myapp", "_type": "node", "_id": "1", "_version": 1, "created": false }
  • > PUT /myapp/node/1?version=1 { "title": "hey boy" } # 200
  • > PUT /myapp/node/1?version=1 { "title": "hey boy" } # 409 > version conflict, current [2], provided [1]
  • Let's SEARCH for something
  • > GET /_search
  • > GET /_search { "took" : 32, "timed_out" : false, "_shards" : { "total" : 20, "successful" : 20, "failed" : 0 }, "hits" : { results... } }
  • Let's SEARCH in multiple indices and types
  • > GET /index/_search > GET /index/type/_search > GET /index1,index2/_search > GET /myapp_*/type, entity_*/_search
  • Let's PAGINATE results
  • > GET /_search?size=10&from=20 size = results per page from = starting from
  • Let's search oldschool
  • > GET /_search?q=title:elasticsearch > GET /_search?q=nid:60
  • +title:awesome +status:1 +created:[1369917354 TO *]
  • ?q=title:awesome%20%2Bcreated: [1369917354%20TO%20*]%2Bstatus:1 +title:awesome +status:1 +created:[1369917354 TO *] The ugly encoding =)
  • Query DSL style
  • > GET /_search -d { "query": { "match": "awesome" } }
  • > GET /_search -d { "query": { "match" : { "title" : { "query" : "+awesome -poor", "boost" : 2.0, } } } }
  • Mappings and types
  • Core types * string * number * date * boolean
  • Complex types * array type * object type * nested type Others: ip type geo point geo shape attachments
  • Define type mapping
  • > PUT /myapp/node -d { "node" : { "properties" : { "message" : { "type" : "string", "store" : true } } } }
  • Indexed fields
  • Full text analyzed == is splitted into terms Term not analyzed == is stored as is
  • > PUT /myapp/node -d { "node" : { "properties" : { "name" : { "type" : "string", "store" : true, “index”: “not_analyzed” } } } }
  • Dynamic mapping
  • Analysis and indexing
  • Inverted index 1. “The quick brown fox jumped over the lazy dog” 2. “Quick brown foxes leap over lazy dogs in summer” Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X |
  • Analyzer Tokenizers ● standard ● keyword ● whitespace ● ngram TokenFilters standard lowercase stop truncate snowball
  • > GET /_analyze?analyzer=standard -d 'this is a test baby' { "tokens" : [ { "token" : "test", "start_offset" : 10, "end_offset" : 14, "type" : "<ALPHANUM>", "position" : 4 }, { "token" : "baby", "start_offset" : 15, "end_offset" : 19, "type" : "<ALPHANUM>", "position" : 5 } ] }
  • Autocomplete fields
  • Queries & Filters
  • Queries & Filters full text search relevance score heavy not cacheable exact match show or hide lightning fast cacheable
  • Combine Filters & Queries
  • > GET /_search -d { "query": { "filtered": { "query": { "match": { "title": "awesome" } }, "filter": { "term": { "type": "article" } } } } }
  • and Sorting
  • > GET /_search -d { "query": { "filtered": { "query": { "match": { "title": "awesome" } }, "filter": { "term": { "type": "article" } } } } "sort": {"date":"desc"} }
  • Relevance. Explain API
  • Term frequency How often does the term appear in the field? The more often, the more relevant. Inverse document frequency How often does each term appear in the index? The more often, the less relevant. T Field norm How long is the field? The longer it is, the less likely it is that words in the field will be relevant.
  • and Facets
  • Facets on Amazon
  • > GET /_search -d { "facets": { "home_team": { "terms": { "field": "field_home_team" } } } }
  • > GET /_search -d { "facets": { "home_team": { "terms": { "field": "field_home_team" } } } } Give your facet a name
  • > GET /_search -d { "facets": { "home_team": { "terms": { "field": "field_home_team" } } } } Your facet filter can be: ● Terms ● Range ● Histogram ● Date Histogram ● Filter ● Query ● Statistical ● Terms Stats ● Geo Distance
  • "facets" : { "home_team" : { "_type" : "terms", "missing" : 203, "total" : 100, "other" : 42, "terms" : [ { "term" : "hou", "count" : 8 }, { "term" : "sln", "count" : 6 }, ...
  • STOP! I want this in Drupal?
  • Available modules: Elasticsearch Elasticsearch Connector Search API elasticsearch
  • Development directions: 1. Search API implementation 2. Field Storage API 3. Alternative backends Available modules: Elasticsearch Elasticsearch Connector Search API elasticsearch
  • Field Storage API implementation Elasticsearch field storage sandbox by Damien Tournoud Started in July 2011
  • Field Storage API implementation Elasticsearch field storage sandbox by Damien Tournoud Started in July 2011 Elasticsearch EntityFieldQuery sandbox https://drupal.org/sandbox/asgorobets/2073151
  • Let's DEMO
  • Let the Search be with you