03. ElasticSearch : Data In, Data Out

OpenThink Labs
OpenThink LabsOpenThink Labs
ElasticSearch
Data In Data Out
http://elastic.openthinklabs.com/
What Is a Document?
{
"name":"John Smith",
"age":42,
"confirmed":true,
"join_date":"2014-06-01",
"home":{
"lat":51.5,
"lon":0.1
},
"accounts":[
{
"type":"facebook",
"id":"johnsmith"
},
{
"type":"twitter",
"id":"johnsmith"
}
]
}
Document Metadata
● _index :: Collection of documents that should
be grouped together for a common reason
● _type :: The class of object that the document
represents
● _id :: The unique identifier for the document
Indexing a Document
Using Our Own ID
PUT /website/blog/123
{
"title": "My first blog entry",
"text": "Just trying this out...",
"date": "2014/01/01"
}
{
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1,
"created": true
}
Index request
Elasticsearch responds
PUT verb : store this document at this URL
Indexing a Document
Autogenerating IDs
POST /website/blog/
{
"title": "My second blog entry",
"text": "Still trying this out...",
"date": "2014/01/01"
}
{
"_index": "website",
"_type": "blog",
"_id": "AVeTjE9FnhloyZ20gpEj",
"_version": 1,
"created": true
}
Index request
Elasticsearch responds
POST verb : store this document under this URL
Retrieving a Document
GET /website/blog/123?pretty
{
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1,
"found": true,
"_source": {
"title": "My first blog entry",
"text": "Just trying this out...",
"date": "2014/01/01"
}
}
curl -i -XGET http://localhost:9200/website/blog/124?pretty
HTTP/1.1 404 Not Found
Content-Type: application/json; charset=UTF-8
Content-Length: 83
{
"_index" : "website",
"_type" : "blog",
"_id" : "124",
"found" : false
}
Retrieving Part of a Document
GET /website/blog/123?_source=title,text
{
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1,
"found": true,
"_source": {
"text": "Just trying this out...",
"title": "My first blog entry"
}
}
GET /website/blog/123/_source
{
"title": "My first blog entry",
"text": "Just trying this out...",
"date": "2014/01/01"
}
Checking Whether a Document Exists
curl -i -IHEAD http://localhost:9200/website/blog/123
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
curl -i -IHEAD http://localhost:9200/website/blog/124
HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
Updating a Whole Document
● Documents in Elasticsearch are immutable; we cannot
change them. Instead, if we need to update an existing
document, we reindex or replace it, which we can do using
the same index API
PUT /website/blog/123
{
"title": "My first blog entry",
"text": "I am starting to get the hang of this...",
"date": "2014/01/02"
} {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 2,
"created": false
}
Creating a New Document
POST /website/blog/
{ ... }
PUT /website/blog/123?op_type=create
{ ... }
PUT /website/blog/123/_create
{ ... }
1
2
3
PUT /website/blog/123?op_type=create
{
"title": "My first blog entry",
"text": "Just trying this out...",
"date": "2014/01/01"
}
{
"error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]",
"status": 409
}
Deleting a Document
DELETE /website/blog/123
{
"found": true,
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 3
}
{
"found": false,
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1
}
DELETE /website/blog/123
Dealing with Conflicts
Consequence of no concurrency control
Optimistic Concurrency Control
PUT /website/blog/1/_create
{
"title": "My first blog entry",
"text": "Just trying this out..."
}
GET /website/blog/1 {
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"title": "My first blog entry",
"text": "Just trying this out..."
}
}
PUT /website/blog/1?version=1
{
"title": "My first blog entry",
"text": "Starting to get the hang of this..."
}
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 2,
"created": false
}
1
2
3
Using Versions from an External System
PUT /website/blog/2?version=5&version_type=external
{
"title": "My first external blog entry",
"text": "Starting to get the hang of this..."
} {
"_index": "website",
"_type": "blog",
"_id": "2",
"_version": 5,
"created": true
}
PUT /website/blog/2?version=10&version_type=external
{
"title": "My first external blog entry",
"text": "This is a piece of cake..."
}
{
"_index": "website",
"_type": "blog",
"_id": "2",
"_version": 10,
"created": false
}
PUT /website/blog/2?version=10&version_type=external
{
"title": "My first external blog entry",
"text": "This is a piece of cake..."
}
{
"error": "VersionConflictEngineException[[website][3] [blog][2]: version conflict, current [10], provided [10]]",
"status": 409
}
1
2
3
Partial Updates to Documents
POST /website/blog/1/_update
{
"doc" : {
"tags" : [ "testing" ],
"views": 0
}
}
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 3
}
GET /website/blog/1
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 3,
"found": true,
"_source": {
"title": "My first blog entry",
"text": "Starting to get the hang of this...",
"views": 0,
"tags": [
"testing"
]
}
}
1
2
Using Scripts to Make Partial Updates
POST /website/blog/1/_update
{
"script" : "ctx._source.views+=1"
}
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 4
}
POST /website/blog/1/_update
{
"script" : "ctx._source.tags+=new_tag",
"params" : {
"new_tag" : "search"
}
}
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 5
}
GET /website/blog/1
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 6,
"found": true,
"_source": {
"title": "My first blog entry",
"text": "Starting to get the hang of this...",
"views": 1,
"tags": [
"testing",
"search"
]
}
}
1
2
3
Using Scripts to Make Partial Updates
POST /website/blog/1/_update
{
"script" : "ctx.op = ctx._source.views == count ? 'delete' : 'none'",
"params" : {
"count": 1
}
}
Delete a document based on its contents, by setting ctx.op to delete
GET /website/blog/1
{
"_index": "website",
"_type": "blog",
"_id": "1",
"found": false
}
Updating a Document That May Not Yet Exist
POST /website/pageviews/1/_update
{
"script" : "ctx._source.views+=1",
"upsert": {
"views": 1
}
}
{
"_index": "website",
"_type": "pageviews",
"_id": "1",
"_version": 1
}
GET /website/pageviews/1 {
"_index": "website",
"_type": "pageviews",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"views": 1
}
}
Update and Conflicts
POST /website/pageviews/1/_update?retry_on_conflict=5
{
"script" : "ctx._source.views+=1",
"upsert": {
"views": 0
}
}
{
"_index": "website",
"_type": "pageviews",
"_id": "1",
"_version": 2
"found": true,
"_source": {
"views": 2
}
}
Retrieving Multiple Documents
GET /_mget
{
"docs" : [
{
"_index" : "website",
"_type" : "blog",
"_id" : 2
},
{
"_index" : "website",
"_type" : "pageviews",
"_id" : 1,
"_source": "views"
}
]
}
{
"docs": [
{
"_index": "website",
"_type": "blog",
"_id": "2",
"_version": 10,
"found": true,
"_source": {
"title": "My first external blog entry",
"text": "This is a piece of cake..."
}
},
{
"_index": "website",
"_type": "pageviews",
"_id": "1",
"_version": 3,
"found": true,
"_source": {
"views": 3
}
}
]
}
Retrieving Multiple Documents
GET /website/blog/_mget
{
"docs" : [
{ "_id" : 2 },
{ "_type" : "pageviews", "_id" : 1 }
]
}
{
"docs": [
{
"_index": "website",
"_type": "blog",
"_id": "2",
"_version": 10,
"found": true,
"_source": {
"title": "My first external blog entry",
"text": "This is a piece of cake..."
}
},
{
"_index": "website",
"_type": "pageviews",
"_id": "1",
"_version": 3,
"found": true,
"_source": {
"views": 3
}
}
]
}
Retrieving Multiple Documents
GET /website/blog/_mget
{
"ids" : [ "2", "1" ]
}
{
"docs": [
{
"_index": "website",
"_type": "blog",
"_id": "2",
"_version": 10,
"found": true,
"_source": {
"title": "My first external blog entry",
"text": "This is a piece of cake..."
}
},
{
"_index": "website",
"_type": "blog",
"_id": "1",
"found": false
}
]
}
Cheaper in Bulk
{ action: { metadata }}n
{ request body }n
{ action: { metadata }}n
{ request body }n
...
The bulk request body has the following format :
POST /_bulk
{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "My first blog post" }
{ "index": { "_index": "website", "_type": "blog" }}
{ "title": "My second blog post" }
{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }
{ "doc" : {"title" : "My updated blog post"} }
{
"took": 4,
"errors": false,
"items": [
{
"delete": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1,
"status": 404,
"found": false
}
},
{
"create": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 2,
"status": 201
}
},
{
"create": {
"_index": "website",
"_type": "blog",
"_id": "AVeVu4ZmPwPQAxVyMVtH",
"_version": 1,
"status": 201
}
},
{
"update": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 3,
"status": 200
}
}
]
}
Cheaper in Bulk
POST /_bulk
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "Cannot create - it already exists" }
{ "index": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "But we can update it" }
{
"took": 2,
"errors": true,
"items": [
{
"create": {
"_index": "website",
"_type": "blog",
"_id": "123",
"status": 409,
"error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]"
}
},
{
"index": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 4,
"status": 200
}
}
]
}
Don’t Repeat Yourself
POST /website/_bulk
{ "index": { "_type": "log" }}
{ "event": "User logged in" } {
"took": 3,
"errors": false,
"items": [
{
"create": {
"_index": "website",
"_type": "log",
"_id": "AVeVyqWVPwPQAxVyMV3_",
"_version": 1,
"status": 201
}
}
]
}
Don’t Repeat Yourself
POST /website/log/_bulk
{ "index": {}}
{ "event": "User logged in" }
{ "index": { "_type": "blog" }}
{ "title": "Overriding the default type" }
{
"took": 2,
"errors": false,
"items": [
{
"create": {
"_index": "website",
"_type": "log",
"_id": "AVeVzBQjPwPQAxVyMV4_",
"_version": 1,
"status": 201
}
},
{
"create": {
"_index": "website",
"_type": "blog",
"_id": "AVeVzBQjPwPQAxVyMV5A",
"_version": 1,
"status": 201
}
}
]
}
How Big Is Too Big ?
Referensi
● ElasticSearch, The Definitive Guide, A Distrib
uted Real-Time Search and Analytics Engine, Cl
inton Gormely & Zachary Tong, O’Reilly
1 of 28

More Related Content

Recently uploaded(20)

MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIA
Federico Karagulian5 views
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 views
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
thomasjvarghese4918 views
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)
Narendra Narendra10 views
PTicketInput.pdfPTicketInput.pdf
PTicketInput.pdf
stuartmcphersonflipm314 views
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdf
ishaniuudeshika21 views
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdf
HiNedHaJar14 views
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9011 views
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej6 views
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 12 views

03. ElasticSearch : Data In, Data Out

  • 1. ElasticSearch Data In Data Out http://elastic.openthinklabs.com/
  • 2. What Is a Document? { "name":"John Smith", "age":42, "confirmed":true, "join_date":"2014-06-01", "home":{ "lat":51.5, "lon":0.1 }, "accounts":[ { "type":"facebook", "id":"johnsmith" }, { "type":"twitter", "id":"johnsmith" } ] }
  • 3. Document Metadata ● _index :: Collection of documents that should be grouped together for a common reason ● _type :: The class of object that the document represents ● _id :: The unique identifier for the document
  • 4. Indexing a Document Using Our Own ID PUT /website/blog/123 { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "created": true } Index request Elasticsearch responds PUT verb : store this document at this URL
  • 5. Indexing a Document Autogenerating IDs POST /website/blog/ { "title": "My second blog entry", "text": "Still trying this out...", "date": "2014/01/01" } { "_index": "website", "_type": "blog", "_id": "AVeTjE9FnhloyZ20gpEj", "_version": 1, "created": true } Index request Elasticsearch responds POST verb : store this document under this URL
  • 6. Retrieving a Document GET /website/blog/123?pretty { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } } curl -i -XGET http://localhost:9200/website/blog/124?pretty HTTP/1.1 404 Not Found Content-Type: application/json; charset=UTF-8 Content-Length: 83 { "_index" : "website", "_type" : "blog", "_id" : "124", "found" : false }
  • 7. Retrieving Part of a Document GET /website/blog/123?_source=title,text { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "text": "Just trying this out...", "title": "My first blog entry" } } GET /website/blog/123/_source { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" }
  • 8. Checking Whether a Document Exists curl -i -IHEAD http://localhost:9200/website/blog/123 HTTP/1.1 200 OK Content-Type: text/plain; charset=UTF-8 Content-Length: 0 curl -i -IHEAD http://localhost:9200/website/blog/124 HTTP/1.1 404 Not Found Content-Type: text/plain; charset=UTF-8 Content-Length: 0
  • 9. Updating a Whole Document ● Documents in Elasticsearch are immutable; we cannot change them. Instead, if we need to update an existing document, we reindex or replace it, which we can do using the same index API PUT /website/blog/123 { "title": "My first blog entry", "text": "I am starting to get the hang of this...", "date": "2014/01/02" } { "_index": "website", "_type": "blog", "_id": "123", "_version": 2, "created": false }
  • 10. Creating a New Document POST /website/blog/ { ... } PUT /website/blog/123?op_type=create { ... } PUT /website/blog/123/_create { ... } 1 2 3 PUT /website/blog/123?op_type=create { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } { "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]", "status": 409 }
  • 11. Deleting a Document DELETE /website/blog/123 { "found": true, "_index": "website", "_type": "blog", "_id": "123", "_version": 3 } { "found": false, "_index": "website", "_type": "blog", "_id": "123", "_version": 1 } DELETE /website/blog/123
  • 12. Dealing with Conflicts Consequence of no concurrency control
  • 13. Optimistic Concurrency Control PUT /website/blog/1/_create { "title": "My first blog entry", "text": "Just trying this out..." } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out..." } } PUT /website/blog/1?version=1 { "title": "My first blog entry", "text": "Starting to get the hang of this..." } { "_index": "website", "_type": "blog", "_id": "1", "_version": 2, "created": false } 1 2 3
  • 14. Using Versions from an External System PUT /website/blog/2?version=5&version_type=external { "title": "My first external blog entry", "text": "Starting to get the hang of this..." } { "_index": "website", "_type": "blog", "_id": "2", "_version": 5, "created": true } PUT /website/blog/2?version=10&version_type=external { "title": "My first external blog entry", "text": "This is a piece of cake..." } { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "created": false } PUT /website/blog/2?version=10&version_type=external { "title": "My first external blog entry", "text": "This is a piece of cake..." } { "error": "VersionConflictEngineException[[website][3] [blog][2]: version conflict, current [10], provided [10]]", "status": 409 } 1 2 3
  • 15. Partial Updates to Documents POST /website/blog/1/_update { "doc" : { "tags" : [ "testing" ], "views": 0 } } { "_index": "website", "_type": "blog", "_id": "1", "_version": 3 } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 3, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 0, "tags": [ "testing" ] } } 1 2
  • 16. Using Scripts to Make Partial Updates POST /website/blog/1/_update { "script" : "ctx._source.views+=1" } { "_index": "website", "_type": "blog", "_id": "1", "_version": 4 } POST /website/blog/1/_update { "script" : "ctx._source.tags+=new_tag", "params" : { "new_tag" : "search" } } { "_index": "website", "_type": "blog", "_id": "1", "_version": 5 } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 6, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 1, "tags": [ "testing", "search" ] } } 1 2 3
  • 17. Using Scripts to Make Partial Updates POST /website/blog/1/_update { "script" : "ctx.op = ctx._source.views == count ? 'delete' : 'none'", "params" : { "count": 1 } } Delete a document based on its contents, by setting ctx.op to delete GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "found": false }
  • 18. Updating a Document That May Not Yet Exist POST /website/pageviews/1/_update { "script" : "ctx._source.views+=1", "upsert": { "views": 1 } } { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1 } GET /website/pageviews/1 { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1, "found": true, "_source": { "views": 1 } }
  • 19. Update and Conflicts POST /website/pageviews/1/_update?retry_on_conflict=5 { "script" : "ctx._source.views+=1", "upsert": { "views": 0 } } { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 2 "found": true, "_source": { "views": 2 } }
  • 20. Retrieving Multiple Documents GET /_mget { "docs" : [ { "_index" : "website", "_type" : "blog", "_id" : 2 }, { "_index" : "website", "_type" : "pageviews", "_id" : 1, "_source": "views" } ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ] }
  • 21. Retrieving Multiple Documents GET /website/blog/_mget { "docs" : [ { "_id" : 2 }, { "_type" : "pageviews", "_id" : 1 } ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ] }
  • 22. Retrieving Multiple Documents GET /website/blog/_mget { "ids" : [ "2", "1" ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "blog", "_id": "1", "found": false } ] }
  • 23. Cheaper in Bulk { action: { metadata }}n { request body }n { action: { metadata }}n { request body }n ... The bulk request body has the following format : POST /_bulk { "delete": { "_index": "website", "_type": "blog", "_id": "123" }} { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "My first blog post" } { "index": { "_index": "website", "_type": "blog" }} { "title": "My second blog post" } { "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} } { "doc" : {"title" : "My updated blog post"} } { "took": 4, "errors": false, "items": [ { "delete": { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "status": 404, "found": false } }, { "create": { "_index": "website", "_type": "blog", "_id": "123", "_version": 2, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVu4ZmPwPQAxVyMVtH", "_version": 1, "status": 201 } }, { "update": { "_index": "website", "_type": "blog", "_id": "123", "_version": 3, "status": 200 } } ] }
  • 24. Cheaper in Bulk POST /_bulk { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "Cannot create - it already exists" } { "index": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "But we can update it" } { "took": 2, "errors": true, "items": [ { "create": { "_index": "website", "_type": "blog", "_id": "123", "status": 409, "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]" } }, { "index": { "_index": "website", "_type": "blog", "_id": "123", "_version": 4, "status": 200 } } ] }
  • 25. Don’t Repeat Yourself POST /website/_bulk { "index": { "_type": "log" }} { "event": "User logged in" } { "took": 3, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVyqWVPwPQAxVyMV3_", "_version": 1, "status": 201 } } ] }
  • 26. Don’t Repeat Yourself POST /website/log/_bulk { "index": {}} { "event": "User logged in" } { "index": { "_type": "blog" }} { "title": "Overriding the default type" } { "took": 2, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVzBQjPwPQAxVyMV4_", "_version": 1, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVzBQjPwPQAxVyMV5A", "_version": 1, "status": 201 } } ] }
  • 27. How Big Is Too Big ?
  • 28. Referensi ● ElasticSearch, The Definitive Guide, A Distrib uted Real-Time Search and Analytics Engine, Cl inton Gormely & Zachary Tong, O’Reilly