Loïc Bertron
Director of Research & Development @Cedrom-SNI
!
Working on Big Data for Cedrom-SNI : social media, tv & radi...
ElasticSearch is offering advanced search features to any application or
website easily, scaling on a large amount of data...
Simple : Plug & Play - Schema free - RESTful API
!
Elastic : Automatically discover all others instances
!
Strong : Replic...
Document as JSON
• Object representing your data
• Grouped in an index
• One index can have multiples types of documents
{...
• API REST : http://host:port/[index]/[type]/[_action/id]

HTTP Methods: GET, POST, PUT, DELETE
• Documents
• http://node1...
Index a document
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-0...
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"1"
"_version":"1"
}
Index a document
Update a document
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-...
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"1"
"_version":"2"
}
Update a document
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
...
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hit...
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hit...
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"...
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"...
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"...
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"...
Search operand
Terms quebec
quebec ontario
Phrases "city of montréal"
Proximity "montreal collusion" ~5
Fuzzy schwarzenegg...
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"...
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"...
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"...
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"...
Facets
Ranges
Term
Term
Ranges
Facets
$ curl -XPOST http://node1:9200/articles/_search -d '{
    "aggregations" : {
"tag_cloud" : { "terms" : {"field" : "tags"}...
$ curl -XPOST http://node1:9200/students/_search?search_type=count -d '{
    "facets": {
"scores-per-subject" : {
"terms_s...
Advanced facets : Aggregations
{
"rank": "21",
"city": "Boston",
"state": "MA",
"population2012": "636479",
"population201...
curl -XGET "node1:9200/cities/_search?pretty" -d '{
"aggs" : {
"mean_density_by_state" : {
"terms" : {
"field" : "state"
}...
"aggregations" : {
"mean_density_by_state" : {
"terms" : [ {
"term" : "CA",
"doc_count" : 69,
"mean_density" : {
"value" :...
Ranges
Term
Facets
Facets
Terms
Terms Stats
Statistical
Range
Histogram
Date Histogram
Filter
Query
Geo Distance
Noeud 1
Cluster
État du cluster : Vert
Node 1
Cluster
Shard 0
Shard 1
cluster state : Yellow
Architecture
$ curl -XPUT loc...
Noeud 1
Cluster
État du cluster : Vert
Noeud 1
Cluster
Shard 0
Shard 1
État du cluster : Jaune
Node 1
Cluster
Shard 0
Shar...
Node 1
Cluster
Shard 0
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
...
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Doc 1
$ curl -X PUT http://node1:9200/twitter/tweet/1 ...
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Doc 1
$ curl -X PUT http://node1:9200/twitter/tweet/1 ...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loic...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"1"
"_version":...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:920...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://nod...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://nod...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"2"...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1
Doc 1
Doc 2
Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1
Doc 1
Doc 2
Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
...
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 1Shard 1
Shard 0
Doc 1
Doc 2 Doc 2
Architecture
Node 2 Node 3 Node 4
Cluster
Shard 1
Node 2
Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Architecture
Node 3 Node 4
Shard 0
Doc 1
Cluster
Shard 1
Node 2
Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Architecture
Node 3 Node 4
Shard 0
Doc 1
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter...
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/t...
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/t...
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"3"
"_versi...
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "que...
Cluster
Shard 1Shard 1
Doc 2
Doc 2
Shard 0
Doc 1Doc 3
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "quer...
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Architecture
Node 2 Node 4
How users see search ?
ResultUser Query List of results
How search engine works?
1. Fetch document field
2. Pick configured anlyser
3. Parse text inot tokens
4. Apply token filters
...
Analyzer
curl -XGET "http://localhost:9200/docs/_analyze?
analyzer=standard&pretty=1" -d "Édith Piaf vedette du feu d'arti...
Analyzer
{
"tokens" : [ {
"token" : "édith",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
}...
composed of a single tokenizer and zero or more filters
Analyzer
Cutting out a string of words & transforming :
!
Whitespace tokenizer :
«Édith piaf» -> «Édith», «Piaf»
!
Standard tokeniz...
Modify, delete or add tokens
!
Asciifolding filter :
«Édith Piaf» -> «Edith Piaf»
!
Stemmer filter (english) :
«stemming» ->...
Analyzer
{
"tokens" : [ {
"token" : "edith",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
}...
1.Documents get indexed
2.I come back often on the search page to run my request
3.I hope that my document will be well ra...
1. Register my query
2. When document get indexed, the percolator look for a match again registered queries
Percolator
Real Time Updates !
Percolator
Percolator
curl -XPUT 'http://node1:9200/twitter/.percolator/elasticsearch' -d '{
"query" : {
"match" : {
"message" : "ela...
Percolator
$ curl -X GET http://node1:9200/twitter/tweet/_percolate -d '{
"doc" : {
    "user": "loicbertron",
    "post_d...
Percolator
{
    "took" : 19,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },...
{
"name": "Jules Verne",
"biography": "One of the greatest author",
!
"books": [
{
"title": "Vingt mille lieues sous les m...
curl -XPUT node1:9200/authors/bare_author/1 -d'{
"name": "Jules Verne",
"biography": « One of the greets author"
}'
curl -...
Others features
• Suggest API : Did you mean ?, Autocomplete, …
• Results Highlight
• More like this
• Backup Data : Snaps...
Clients
• Perl
• Python
• Ruby
• Php
• Javascript
• Java
• .Net
• Scala
• Clojure
• Erlang
• Eventmachine
• Bash
• Ocaml
•...
Who’s using it ?
Questions
Thank you
Thank you David Pilato for his presentation : https://speakerdeck.com/dadoonet/tours-jug-elasticsearch
Thank you...
Bonus :)
Suggest
curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
  "suggest" : {
    "my-title-suggestions-1" : {
 ...
Suggest
"suggest": {
    "my-title-suggestions-1": [
      {
        "text": "devloping",
        "offset": 0,
        "le...
More Like This
curl -XGET 'http://node1:9200/twitter/tweet/1/_mlt?mlt_fields=tag,content&min_doc_freq=1'
{
    "more_like_...
Highlight
{
    "query" : {...},
    "highlight" : {
        "number_of_fragments" : 3,
        "fragment_size" : 150,
        "tag_...
Hadoop
Hadoop
• Java library for integrating Elasticsearch and Hadoop
• Pig, Hive, Cascading, MapReduce
• Search and Real Time An...
Montreal Elasticsearch Meetup
Upcoming SlideShare
Loading in...5
×

Montreal Elasticsearch Meetup

777

Published on

Elasticsearch Montreal Meetup, March 12th

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
777
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Montreal Elasticsearch Meetup

  1. 1. Loïc Bertron Director of Research & Development @Cedrom-SNI ! Working on Big Data for Cedrom-SNI : social media, tv & radio aggregation Introduced Elasticsearch at Cedrom-Sni ! Cedrom-Sni ! 10k+ different sources, 750k+ new docs/days Our job : Ingesting, enriching, extracting analytics and intelligence from docs loic.bertron@cedrom-sni.com linkedin.com/in/loicbertron @loicbertron Who am I ?
  2. 2. ElasticSearch is offering advanced search features to any application or website easily, scaling on a large amount of data. « » ElasticSearch
  3. 3. Simple : Plug & Play - Schema free - RESTful API ! Elastic : Automatically discover all others instances ! Strong : Replication & Load balancing - Scales massively - Lucene based ! Fast : Requests executed in parallel - Real Time ! Full featured : Search, Analytics, Facets, Percolator, Geo search, Suggest, Plugins … What is ElasticSearch ?
  4. 4. Document as JSON • Object representing your data • Grouped in an index • One index can have multiples types of documents {     "message": "Introducing #ElasticSearch", "post_date": "2014-03-12T18:30:00",     "author": { "first_name" : "Loïc", "email" : "loic.bertron@cedrom-sni.com" }, "employee_at_Cedrom" : true, "Tags" : ["Meetup","Montreal"] }
  5. 5. • API REST : http://host:port/[index]/[type]/[_action/id]
 HTTP Methods: GET, POST, PUT, DELETE • Documents • http://node1:9200/twitter/tweet/1 (POST) • http://node1:9200/twitter/tweet/1 (GET) • http://node1:9200/twitter/tweet/1 (DELETE) • Search • http://node1:9200/twitter/tweet/_search (GET) • http://node1:9200/twitter/_search (GET) • http://node1:9200/_search (GET) • Metadata • http://node1:9200/twitter/_status (GET) • http://node1:9200/_shutdown (POST) API
  6. 6. Index a document $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }'
  7. 7. { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"1" "_version":"1" } Index a document
  8. 8. Update a document $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" }'
  9. 9. { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"1" "_version":"2" } Update a document
  10. 10. $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Search for documents $ curl -XGET http://node1:9200/twitter/tweet/_search?q=elasticsearch
  11. 11. Search for documents { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } }
  12. 12. Search for documents { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Execution time
  13. 13. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } # of documents matching Search for documents
  14. 14. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Infos Search for documents
  15. 15. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Score Search for documents
  16. 16. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Document Search for documents
  17. 17. Search operand Terms quebec quebec ontario Phrases "city of montréal" Proximity "montreal collusion" ~5 Fuzzy schwarzenegger ~0.8 Wildcards queb* Boosting Quebec^5 montreal Range [2011/03/12 TO 2014/03/12] [java to json] Boolean quebec AND NOT montreal +quebec -montreal (quebec OR ottawa) AND NOT toronto Fields title:montreal^10 OR body:montreal $ curl -XGET http://node1:9200/twitter/tweet/_search?q=<Your Query>
  18. 18. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL
  19. 19. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } }
  20. 20. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } }
  21. 21. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] }
  22. 22. Facets
  23. 23. Ranges Term Term Ranges Facets
  24. 24. $ curl -XPOST http://node1:9200/articles/_search -d '{     "aggregations" : { "tag_cloud" : { "terms" : {"field" : "tags"} } } }' Tag Cloud "aggregations" : { "tag_cloud" :[ {"terms": "Quebec", "count" : 5}, {"terms": "Montréal", "count" : 3}, ... ] }
  25. 25. $ curl -XPOST http://node1:9200/students/_search?search_type=count -d '{     "facets": { "scores-per-subject" : { "terms_stats" : { "key_field" : "subject", "value_field" : "score" } } } }' Stats "facets" : { "scores-per-subject" : { "_type" : "terms_stats", "missing" : 0, "terms" : [ { "term" : "math", "count" : 4, "total_count" : 4, "min" : 25.0, "max" : 92.0, "total" : 267.0, "mean" : 66.75 }, […] } }
  26. 26. Advanced facets : Aggregations { "rank": "21", "city": "Boston", "state": "MA", "population2012": "636479", "population2010": "617594", "land_area": "48.277", "density": "12793", "ansi": "619463", "location": { "lat": "42.332", "lon": "71.0202" } }
  27. 27. curl -XGET "node1:9200/cities/_search?pretty" -d '{ "aggs" : { "mean_density_by_state" : { "terms" : { "field" : "state" }, "aggs": { "mean_density": { "avg" : { "field" : "density" } } } } } }' Advanced facets : Aggregations
  28. 28. "aggregations" : { "mean_density_by_state" : { "terms" : [ { "term" : "CA", "doc_count" : 69, "mean_density" : { "value" : 5558.623188405797 } }, { "term" : "TX", "doc_count" : 32, "mean_density" : { "value" : 2496.625 } }, { "term" : "FL", "doc_count" : 20, "mean_density" : { "value" : 4006.6 } }, { "term" : "CO", "doc_count" : 11, Advanced facets : Aggregations
  29. 29. Ranges Term Facets
  30. 30. Facets Terms Terms Stats Statistical Range Histogram Date Histogram Filter Query Geo Distance
  31. 31. Noeud 1 Cluster État du cluster : Vert Node 1 Cluster Shard 0 Shard 1 cluster state : Yellow Architecture $ curl -XPUT localhost:9200/twitter -d '{ "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }'
  32. 32. Noeud 1 Cluster État du cluster : Vert Noeud 1 Cluster Shard 0 Shard 1 État du cluster : Jaune Node 1 Cluster Shard 0 Shard 1 cluster state : Green Node 2 Shard 0 Shard 1 adding a second node Architecture
  33. 33. Node 1 Cluster Shard 0 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  34. 34. Node 1 Cluster Shard 0 Node 3 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  35. 35. Node 1 Cluster Shard 0 Node 3 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  36. 36. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  37. 37. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  38. 38. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  39. 39. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture
  40. 40. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Doc 1 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture
  41. 41. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Doc 1 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture
  42. 42. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture Node 1 Node 2 Node 3 Node 4
  43. 43. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"1" "_version":"1" } Architecture Node 1 Node 2 Node 3 Node 4
  44. 44. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  45. 45. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  46. 46. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  47. 47. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  48. 48. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"2" "_version":"1" } Architecture Node 1 Node 2 Node 3 Node 4
  49. 49. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  50. 50. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  51. 51. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  52. 52. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  53. 53. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  54. 54. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4
  55. 55. Cluster Shard 1Shard 1 Shard 0 Doc 1 Doc 2 Doc 2 Architecture Node 2 Node 3 Node 4
  56. 56. Cluster Shard 1 Node 2 Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Architecture Node 3 Node 4 Shard 0 Doc 1
  57. 57. Cluster Shard 1 Node 2 Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Architecture Node 3 Node 4 Shard 0 Doc 1
  58. 58. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Architecture Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" }' Shard 0 Doc 1
  59. 59. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 Architecture Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" }' Shard 0 Doc 1
  60. 60. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 Architecture Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" }' Shard 0 Doc 1 Doc 3
  61. 61. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"3" "_version":"1" } Architecture Node 2 Node 3 Node 4 Shard 0 Doc 1 Doc 3
  62. 62. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 2 Node 3 Node 4 Shard 0 Doc 1 Doc 3
  63. 63. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1Doc 3 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 2 Node 3 Node 4 Shard 0 Doc 1 Doc 3
  64. 64. Cluster Shard 1Shard 1 Doc 2 Doc 2 Architecture Node 2 Node 4
  65. 65. How users see search ? ResultUser Query List of results
  66. 66. How search engine works? 1. Fetch document field 2. Pick configured anlyser 3. Parse text inot tokens 4. Apply token filters 5. Store into index
  67. 67. Analyzer curl -XGET "http://localhost:9200/docs/_analyze? analyzer=standard&pretty=1" -d "Édith Piaf vedette du feu d'artifice"
  68. 68. Analyzer { "tokens" : [ { "token" : "édith", "start_offset" : 0, "end_offset" : 5, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "piaf", "start_offset" : 6, "end_offset" : 10, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "vedette", "start_offset" : 11, "end_offset" : 18, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "du", "start_offset" : 19, "end_offset" : 21, "type" : "<ALPHANUM>", "position" : 4 }, { "token" : "feu", "start_offset" : 22, "end_offset" : 25, "type" : "<ALPHANUM>", "position" : 5 }, { "token" : "d'artifice", "start_offset" : 26, "end_offset" : 36, "type" : "<ALPHANUM>", "position" : 6 } ] }
  69. 69. composed of a single tokenizer and zero or more filters Analyzer
  70. 70. Cutting out a string of words & transforming : ! Whitespace tokenizer : «Édith piaf» -> «Édith», «Piaf» ! Standard tokenizer : «Édith piaf!» -> «édith», «piaf» Tokenizer
  71. 71. Modify, delete or add tokens ! Asciifolding filter : «Édith Piaf» -> «Edith Piaf» ! Stemmer filter (english) : «stemming» -> «stem» «fishing», «fished», «fisher» -> «fish» «cats,catlike» -> «cat» ! Phonetic : «quick» -> «Q200» «quik» -> «Q200» ! Edge nGram : «Montreal» -> [«Mon», «Mont», «Montr»] Filters
  72. 72. Analyzer { "tokens" : [ { "token" : "edith", "start_offset" : 0, "end_offset" : 5, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "piaf", "start_offset" : 6, "end_offset" : 10, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "vedet", "start_offset" : 11, "end_offset" : 18, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "feu", "start_offset" : 22, "end_offset" : 25, "type" : "<ALPHANUM>", "position" : 5 }, ! ! { "token" : "artific", "start_offset" : 26, "end_offset" : 36, "type" : "<ALPHANUM>", "position" : 6 } ] }
  73. 73. 1.Documents get indexed 2.I come back often on the search page to run my request 3.I hope that my document will be well ranked to be on top of the results page 4.if not, i won’t never see my document Regular search engine usage
  74. 74. 1. Register my query 2. When document get indexed, the percolator look for a match again registered queries Percolator
  75. 75. Real Time Updates ! Percolator
  76. 76. Percolator curl -XPUT 'http://node1:9200/twitter/.percolator/elasticsearch' -d '{ "query" : { "match" : { "message" : "elasticsearch" } } }'
  77. 77. Percolator $ curl -X GET http://node1:9200/twitter/tweet/_percolate -d '{ "doc" : {     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" } }'
  78. 78. Percolator {     "took" : 19,     "_shards" : {         "total" : 5,         "successful" : 5,         "failed" : 0     },     "total" : 1,     "matches" : [         {              "_index" : "twitter",              "_id" : "elasticsearch"         }     ] }
  79. 79. { "name": "Jules Verne", "biography": "One of the greatest author", ! "books": [ { "title": "Vingt mille lieues sous les mers", "genre": "Novel", "publisher": "Hetzel" } { "title": "Les Châteaux en Californie", "genre": "Drama", "publisher": "Marc Soriano" } ] } Inner objects
  80. 80. curl -XPUT node1:9200/authors/bare_author/1 -d'{ "name": "Jules Verne", "biography": « One of the greets author" }' curl -XPOST node1:9200/authors/book/1?parent=1 -d '{ "title": "Les Châteaux en Californie", "genre": "Drama", "publisher": "Marc Soriano" }' ! curl -XPOST node1:9200/authors/book/2?parent=1 -d '{ "title": "Vingt mille lieues sous les mers", "genre": "Novel", "publisher": "Hetzel" ! }' Parents / Childs
  81. 81. Others features • Suggest API : Did you mean ?, Autocomplete, … • Results Highlight • More like this • Backup Data : Snapshot / Restore • File System • Amazon S3 • HDFS • Google Compute Engine • Microsoft Azure • Hadoop connector
  82. 82. Clients • Perl • Python • Ruby • Php • Javascript • Java • .Net • Scala • Clojure • Erlang • Eventmachine • Bash • Ocaml • Smalltalk • Cold Fusion
  83. 83. Who’s using it ?
  84. 84. Questions
  85. 85. Thank you Thank you David Pilato for his presentation : https://speakerdeck.com/dadoonet/tours-jug-elasticsearch Thank you Kevin Kluge for his presentation : https://speakerdeck.com/elasticsearch/elasticsearch-in-20-minutes
  86. 86. Bonus :)
  87. 87. Suggest curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{   "suggest" : {     "my-title-suggestions-1" : {       "text" : "devloping",       "term" : {         "size" : 3,         "field" : "title"         }     }   } }'
  88. 88. Suggest "suggest": {     "my-title-suggestions-1": [       {         "text": "devloping",         "offset": 0,         "length": 9,         "options": [           {             "text": "developing",             "freq": 77,             "score": 0.8888889           },           {             "text": "deloping",             "freq": 1,             "score": 0.875           },           {             "text": "deploying",             "freq": 2,             "score": 0.7777778           }         ]       }
  89. 89. More Like This curl -XGET 'http://node1:9200/twitter/tweet/1/_mlt?mlt_fields=tag,content&min_doc_freq=1' {     "more_like_this" : {         "fields" : ["name.first", "name.last"],         "like_text" : "text like this one",         "min_term_freq" : 1,         "max_query_terms" : 12,         "percent_terms_to_match" : 0.95     } }
  90. 90. Highlight
  91. 91. {     "query" : {...},     "highlight" : {         "number_of_fragments" : 3,         "fragment_size" : 150,         "tag_schema" : "styled",         "fields" : {             "_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },             "bio.title" : { "number_of_fragments" : 0 },             "bio.author" : { "number_of_fragments" : 0 },             "bio.content" : { "number_of_fragments" : 5, "order" : "score" }         }     } } Highlight
  92. 92. Hadoop
  93. 93. Hadoop • Java library for integrating Elasticsearch and Hadoop • Pig, Hive, Cascading, MapReduce • Search and Real Time Analytics with Elasticsearch, Hadoop as Data Lake • Scales with Hadoop

×