ElasticSearch
Data In Data Out
http://elastic.openthinklabs.com/
What Is a Document?
{
"name":"John Smith",
"age":42,
"confirmed":true,
"join_date":"2014-06-01",
"home":{
"lat":51.5,
"lon":0.1
},
"accounts":[
{
"type":"facebook",
"id":"johnsmith"
},
{
"type":"twitter",
"id":"johnsmith"
}
]
}
Document Metadata
● _index :: Collection of documents that should
be grouped together for a common reason
● _type :: The class of object that the document
represents
● _id :: The unique identifier for the document
Indexing a Document
Using Our Own ID
PUT /website/blog/123
{
"title": "My first blog entry",
"text": "Just trying this out...",
"date": "2014/01/01"
}
{
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1,
"created": true
}
Index request
Elasticsearch responds
PUT verb : store this document at this URL
Indexing a Document
Autogenerating IDs
POST /website/blog/
{
"title": "My second blog entry",
"text": "Still trying this out...",
"date": "2014/01/01"
}
{
"_index": "website",
"_type": "blog",
"_id": "AVeTjE9FnhloyZ20gpEj",
"_version": 1,
"created": true
}
Index request
Elasticsearch responds
POST verb : store this document under this URL
Retrieving a Document
GET /website/blog/123?pretty
{
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1,
"found": true,
"_source": {
"title": "My first blog entry",
"text": "Just trying this out...",
"date": "2014/01/01"
}
}
curl -i -XGET http://localhost:9200/website/blog/124?pretty
HTTP/1.1 404 Not Found
Content-Type: application/json; charset=UTF-8
Content-Length: 83
{
"_index" : "website",
"_type" : "blog",
"_id" : "124",
"found" : false
}
Retrieving Part of a Document
GET /website/blog/123?_source=title,text
{
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1,
"found": true,
"_source": {
"text": "Just trying this out...",
"title": "My first blog entry"
}
}
GET /website/blog/123/_source
{
"title": "My first blog entry",
"text": "Just trying this out...",
"date": "2014/01/01"
}
Checking Whether a Document Exists
curl -i -IHEAD http://localhost:9200/website/blog/123
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
curl -i -IHEAD http://localhost:9200/website/blog/124
HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
Updating a Whole Document
● Documents in Elasticsearch are immutable; we cannot
change them. Instead, if we need to update an existing
document, we reindex or replace it, which we can do using
the same index API
PUT /website/blog/123
{
"title": "My first blog entry",
"text": "I am starting to get the hang of this...",
"date": "2014/01/02"
} {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 2,
"created": false
}
Creating a New Document
POST /website/blog/
{ ... }
PUT /website/blog/123?op_type=create
{ ... }
PUT /website/blog/123/_create
{ ... }
1
2
3
PUT /website/blog/123?op_type=create
{
"title": "My first blog entry",
"text": "Just trying this out...",
"date": "2014/01/01"
}
{
"error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]",
"status": 409
}
Deleting a Document
DELETE /website/blog/123
{
"found": true,
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 3
}
{
"found": false,
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1
}
DELETE /website/blog/123
Dealing with Conflicts
Consequence of no concurrency control
Optimistic Concurrency Control
PUT /website/blog/1/_create
{
"title": "My first blog entry",
"text": "Just trying this out..."
}
GET /website/blog/1 {
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"title": "My first blog entry",
"text": "Just trying this out..."
}
}
PUT /website/blog/1?version=1
{
"title": "My first blog entry",
"text": "Starting to get the hang of this..."
}
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 2,
"created": false
}
1
2
3
Using Versions from an External System
PUT /website/blog/2?version=5&version_type=external
{
"title": "My first external blog entry",
"text": "Starting to get the hang of this..."
} {
"_index": "website",
"_type": "blog",
"_id": "2",
"_version": 5,
"created": true
}
PUT /website/blog/2?version=10&version_type=external
{
"title": "My first external blog entry",
"text": "This is a piece of cake..."
}
{
"_index": "website",
"_type": "blog",
"_id": "2",
"_version": 10,
"created": false
}
PUT /website/blog/2?version=10&version_type=external
{
"title": "My first external blog entry",
"text": "This is a piece of cake..."
}
{
"error": "VersionConflictEngineException[[website][3] [blog][2]: version conflict, current [10], provided [10]]",
"status": 409
}
1
2
3
Partial Updates to Documents
POST /website/blog/1/_update
{
"doc" : {
"tags" : [ "testing" ],
"views": 0
}
}
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 3
}
GET /website/blog/1
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 3,
"found": true,
"_source": {
"title": "My first blog entry",
"text": "Starting to get the hang of this...",
"views": 0,
"tags": [
"testing"
]
}
}
1
2
Using Scripts to Make Partial Updates
POST /website/blog/1/_update
{
"script" : "ctx._source.views+=1"
}
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 4
}
POST /website/blog/1/_update
{
"script" : "ctx._source.tags+=new_tag",
"params" : {
"new_tag" : "search"
}
}
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 5
}
GET /website/blog/1
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 6,
"found": true,
"_source": {
"title": "My first blog entry",
"text": "Starting to get the hang of this...",
"views": 1,
"tags": [
"testing",
"search"
]
}
}
1
2
3
Using Scripts to Make Partial Updates
POST /website/blog/1/_update
{
"script" : "ctx.op = ctx._source.views == count ? 'delete' : 'none'",
"params" : {
"count": 1
}
}
Delete a document based on its contents, by setting ctx.op to delete
GET /website/blog/1
{
"_index": "website",
"_type": "blog",
"_id": "1",
"found": false
}
Updating a Document That May Not Yet Exist
POST /website/pageviews/1/_update
{
"script" : "ctx._source.views+=1",
"upsert": {
"views": 1
}
}
{
"_index": "website",
"_type": "pageviews",
"_id": "1",
"_version": 1
}
GET /website/pageviews/1 {
"_index": "website",
"_type": "pageviews",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"views": 1
}
}
Update and Conflicts
POST /website/pageviews/1/_update?retry_on_conflict=5
{
"script" : "ctx._source.views+=1",
"upsert": {
"views": 0
}
}
{
"_index": "website",
"_type": "pageviews",
"_id": "1",
"_version": 2
"found": true,
"_source": {
"views": 2
}
}
Retrieving Multiple Documents
GET /_mget
{
"docs" : [
{
"_index" : "website",
"_type" : "blog",
"_id" : 2
},
{
"_index" : "website",
"_type" : "pageviews",
"_id" : 1,
"_source": "views"
}
]
}
{
"docs": [
{
"_index": "website",
"_type": "blog",
"_id": "2",
"_version": 10,
"found": true,
"_source": {
"title": "My first external blog entry",
"text": "This is a piece of cake..."
}
},
{
"_index": "website",
"_type": "pageviews",
"_id": "1",
"_version": 3,
"found": true,
"_source": {
"views": 3
}
}
]
}
Retrieving Multiple Documents
GET /website/blog/_mget
{
"docs" : [
{ "_id" : 2 },
{ "_type" : "pageviews", "_id" : 1 }
]
}
{
"docs": [
{
"_index": "website",
"_type": "blog",
"_id": "2",
"_version": 10,
"found": true,
"_source": {
"title": "My first external blog entry",
"text": "This is a piece of cake..."
}
},
{
"_index": "website",
"_type": "pageviews",
"_id": "1",
"_version": 3,
"found": true,
"_source": {
"views": 3
}
}
]
}
Retrieving Multiple Documents
GET /website/blog/_mget
{
"ids" : [ "2", "1" ]
}
{
"docs": [
{
"_index": "website",
"_type": "blog",
"_id": "2",
"_version": 10,
"found": true,
"_source": {
"title": "My first external blog entry",
"text": "This is a piece of cake..."
}
},
{
"_index": "website",
"_type": "blog",
"_id": "1",
"found": false
}
]
}
Cheaper in Bulk
{ action: { metadata }}n
{ request body }n
{ action: { metadata }}n
{ request body }n
...
The bulk request body has the following format :
POST /_bulk
{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "My first blog post" }
{ "index": { "_index": "website", "_type": "blog" }}
{ "title": "My second blog post" }
{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }
{ "doc" : {"title" : "My updated blog post"} }
{
"took": 4,
"errors": false,
"items": [
{
"delete": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1,
"status": 404,
"found": false
}
},
{
"create": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 2,
"status": 201
}
},
{
"create": {
"_index": "website",
"_type": "blog",
"_id": "AVeVu4ZmPwPQAxVyMVtH",
"_version": 1,
"status": 201
}
},
{
"update": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 3,
"status": 200
}
}
]
}
Cheaper in Bulk
POST /_bulk
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "Cannot create - it already exists" }
{ "index": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "But we can update it" }
{
"took": 2,
"errors": true,
"items": [
{
"create": {
"_index": "website",
"_type": "blog",
"_id": "123",
"status": 409,
"error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]"
}
},
{
"index": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 4,
"status": 200
}
}
]
}
Don’t Repeat Yourself
POST /website/_bulk
{ "index": { "_type": "log" }}
{ "event": "User logged in" } {
"took": 3,
"errors": false,
"items": [
{
"create": {
"_index": "website",
"_type": "log",
"_id": "AVeVyqWVPwPQAxVyMV3_",
"_version": 1,
"status": 201
}
}
]
}
Don’t Repeat Yourself
POST /website/log/_bulk
{ "index": {}}
{ "event": "User logged in" }
{ "index": { "_type": "blog" }}
{ "title": "Overriding the default type" }
{
"took": 2,
"errors": false,
"items": [
{
"create": {
"_index": "website",
"_type": "log",
"_id": "AVeVzBQjPwPQAxVyMV4_",
"_version": 1,
"status": 201
}
},
{
"create": {
"_index": "website",
"_type": "blog",
"_id": "AVeVzBQjPwPQAxVyMV5A",
"_version": 1,
"status": 201
}
}
]
}
How Big Is Too Big ?
Referensi
● ElasticSearch, The Definitive Guide, A Distrib
uted Real-Time Search and Analytics Engine, Cl
inton Gormely & Zachary Tong, O’Reilly

03. ElasticSearch : Data In, Data Out

  • 1.
    ElasticSearch Data In DataOut http://elastic.openthinklabs.com/
  • 2.
    What Is aDocument? { "name":"John Smith", "age":42, "confirmed":true, "join_date":"2014-06-01", "home":{ "lat":51.5, "lon":0.1 }, "accounts":[ { "type":"facebook", "id":"johnsmith" }, { "type":"twitter", "id":"johnsmith" } ] }
  • 3.
    Document Metadata ● _index:: Collection of documents that should be grouped together for a common reason ● _type :: The class of object that the document represents ● _id :: The unique identifier for the document
  • 4.
    Indexing a Document UsingOur Own ID PUT /website/blog/123 { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "created": true } Index request Elasticsearch responds PUT verb : store this document at this URL
  • 5.
    Indexing a Document AutogeneratingIDs POST /website/blog/ { "title": "My second blog entry", "text": "Still trying this out...", "date": "2014/01/01" } { "_index": "website", "_type": "blog", "_id": "AVeTjE9FnhloyZ20gpEj", "_version": 1, "created": true } Index request Elasticsearch responds POST verb : store this document under this URL
  • 6.
    Retrieving a Document GET/website/blog/123?pretty { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } } curl -i -XGET http://localhost:9200/website/blog/124?pretty HTTP/1.1 404 Not Found Content-Type: application/json; charset=UTF-8 Content-Length: 83 { "_index" : "website", "_type" : "blog", "_id" : "124", "found" : false }
  • 7.
    Retrieving Part ofa Document GET /website/blog/123?_source=title,text { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "text": "Just trying this out...", "title": "My first blog entry" } } GET /website/blog/123/_source { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" }
  • 8.
    Checking Whether aDocument Exists curl -i -IHEAD http://localhost:9200/website/blog/123 HTTP/1.1 200 OK Content-Type: text/plain; charset=UTF-8 Content-Length: 0 curl -i -IHEAD http://localhost:9200/website/blog/124 HTTP/1.1 404 Not Found Content-Type: text/plain; charset=UTF-8 Content-Length: 0
  • 9.
    Updating a WholeDocument ● Documents in Elasticsearch are immutable; we cannot change them. Instead, if we need to update an existing document, we reindex or replace it, which we can do using the same index API PUT /website/blog/123 { "title": "My first blog entry", "text": "I am starting to get the hang of this...", "date": "2014/01/02" } { "_index": "website", "_type": "blog", "_id": "123", "_version": 2, "created": false }
  • 10.
    Creating a NewDocument POST /website/blog/ { ... } PUT /website/blog/123?op_type=create { ... } PUT /website/blog/123/_create { ... } 1 2 3 PUT /website/blog/123?op_type=create { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } { "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]", "status": 409 }
  • 11.
    Deleting a Document DELETE/website/blog/123 { "found": true, "_index": "website", "_type": "blog", "_id": "123", "_version": 3 } { "found": false, "_index": "website", "_type": "blog", "_id": "123", "_version": 1 } DELETE /website/blog/123
  • 12.
    Dealing with Conflicts Consequenceof no concurrency control
  • 13.
    Optimistic Concurrency Control PUT/website/blog/1/_create { "title": "My first blog entry", "text": "Just trying this out..." } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out..." } } PUT /website/blog/1?version=1 { "title": "My first blog entry", "text": "Starting to get the hang of this..." } { "_index": "website", "_type": "blog", "_id": "1", "_version": 2, "created": false } 1 2 3
  • 14.
    Using Versions froman External System PUT /website/blog/2?version=5&version_type=external { "title": "My first external blog entry", "text": "Starting to get the hang of this..." } { "_index": "website", "_type": "blog", "_id": "2", "_version": 5, "created": true } PUT /website/blog/2?version=10&version_type=external { "title": "My first external blog entry", "text": "This is a piece of cake..." } { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "created": false } PUT /website/blog/2?version=10&version_type=external { "title": "My first external blog entry", "text": "This is a piece of cake..." } { "error": "VersionConflictEngineException[[website][3] [blog][2]: version conflict, current [10], provided [10]]", "status": 409 } 1 2 3
  • 15.
    Partial Updates toDocuments POST /website/blog/1/_update { "doc" : { "tags" : [ "testing" ], "views": 0 } } { "_index": "website", "_type": "blog", "_id": "1", "_version": 3 } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 3, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 0, "tags": [ "testing" ] } } 1 2
  • 16.
    Using Scripts toMake Partial Updates POST /website/blog/1/_update { "script" : "ctx._source.views+=1" } { "_index": "website", "_type": "blog", "_id": "1", "_version": 4 } POST /website/blog/1/_update { "script" : "ctx._source.tags+=new_tag", "params" : { "new_tag" : "search" } } { "_index": "website", "_type": "blog", "_id": "1", "_version": 5 } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 6, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 1, "tags": [ "testing", "search" ] } } 1 2 3
  • 17.
    Using Scripts toMake Partial Updates POST /website/blog/1/_update { "script" : "ctx.op = ctx._source.views == count ? 'delete' : 'none'", "params" : { "count": 1 } } Delete a document based on its contents, by setting ctx.op to delete GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "found": false }
  • 18.
    Updating a DocumentThat May Not Yet Exist POST /website/pageviews/1/_update { "script" : "ctx._source.views+=1", "upsert": { "views": 1 } } { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1 } GET /website/pageviews/1 { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1, "found": true, "_source": { "views": 1 } }
  • 19.
    Update and Conflicts POST/website/pageviews/1/_update?retry_on_conflict=5 { "script" : "ctx._source.views+=1", "upsert": { "views": 0 } } { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 2 "found": true, "_source": { "views": 2 } }
  • 20.
    Retrieving Multiple Documents GET/_mget { "docs" : [ { "_index" : "website", "_type" : "blog", "_id" : 2 }, { "_index" : "website", "_type" : "pageviews", "_id" : 1, "_source": "views" } ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ] }
  • 21.
    Retrieving Multiple Documents GET/website/blog/_mget { "docs" : [ { "_id" : 2 }, { "_type" : "pageviews", "_id" : 1 } ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ] }
  • 22.
    Retrieving Multiple Documents GET/website/blog/_mget { "ids" : [ "2", "1" ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "blog", "_id": "1", "found": false } ] }
  • 23.
    Cheaper in Bulk {action: { metadata }}n { request body }n { action: { metadata }}n { request body }n ... The bulk request body has the following format : POST /_bulk { "delete": { "_index": "website", "_type": "blog", "_id": "123" }} { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "My first blog post" } { "index": { "_index": "website", "_type": "blog" }} { "title": "My second blog post" } { "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} } { "doc" : {"title" : "My updated blog post"} } { "took": 4, "errors": false, "items": [ { "delete": { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "status": 404, "found": false } }, { "create": { "_index": "website", "_type": "blog", "_id": "123", "_version": 2, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVu4ZmPwPQAxVyMVtH", "_version": 1, "status": 201 } }, { "update": { "_index": "website", "_type": "blog", "_id": "123", "_version": 3, "status": 200 } } ] }
  • 24.
    Cheaper in Bulk POST/_bulk { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "Cannot create - it already exists" } { "index": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "But we can update it" } { "took": 2, "errors": true, "items": [ { "create": { "_index": "website", "_type": "blog", "_id": "123", "status": 409, "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]" } }, { "index": { "_index": "website", "_type": "blog", "_id": "123", "_version": 4, "status": 200 } } ] }
  • 25.
    Don’t Repeat Yourself POST/website/_bulk { "index": { "_type": "log" }} { "event": "User logged in" } { "took": 3, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVyqWVPwPQAxVyMV3_", "_version": 1, "status": 201 } } ] }
  • 26.
    Don’t Repeat Yourself POST/website/log/_bulk { "index": {}} { "event": "User logged in" } { "index": { "_type": "blog" }} { "title": "Overriding the default type" } { "took": 2, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVzBQjPwPQAxVyMV4_", "_version": 1, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVzBQjPwPQAxVyMV5A", "_version": 1, "status": 201 } } ] }
  • 27.
    How Big IsToo Big ?
  • 28.
    Referensi ● ElasticSearch, TheDefinitive Guide, A Distrib uted Real-Time Search and Analytics Engine, Cl inton Gormely & Zachary Tong, O’Reilly