Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

03. ElasticSearch : Data In, Data Out

423 views

Published on

03. ElasticSearch : Data In, Data Out

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

03. ElasticSearch : Data In, Data Out

  1. 1. ElasticSearch Data In Data Out http://elastic.openthinklabs.com/
  2. 2. What Is a Document? { "name":"John Smith", "age":42, "confirmed":true, "join_date":"2014-06-01", "home":{ "lat":51.5, "lon":0.1 }, "accounts":[ { "type":"facebook", "id":"johnsmith" }, { "type":"twitter", "id":"johnsmith" } ] }
  3. 3. Document Metadata ● _index :: Collection of documents that should be grouped together for a common reason ● _type :: The class of object that the document represents ● _id :: The unique identifier for the document
  4. 4. Indexing a Document Using Our Own ID PUT /website/blog/123 { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "created": true } Index request Elasticsearch responds PUT verb : store this document at this URL
  5. 5. Indexing a Document Autogenerating IDs POST /website/blog/ { "title": "My second blog entry", "text": "Still trying this out...", "date": "2014/01/01" } { "_index": "website", "_type": "blog", "_id": "AVeTjE9FnhloyZ20gpEj", "_version": 1, "created": true } Index request Elasticsearch responds POST verb : store this document under this URL
  6. 6. Retrieving a Document GET /website/blog/123?pretty { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } } curl -i -XGET http://localhost:9200/website/blog/124?pretty HTTP/1.1 404 Not Found Content-Type: application/json; charset=UTF-8 Content-Length: 83 { "_index" : "website", "_type" : "blog", "_id" : "124", "found" : false }
  7. 7. Retrieving Part of a Document GET /website/blog/123?_source=title,text { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "text": "Just trying this out...", "title": "My first blog entry" } } GET /website/blog/123/_source { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" }
  8. 8. Checking Whether a Document Exists curl -i -IHEAD http://localhost:9200/website/blog/123 HTTP/1.1 200 OK Content-Type: text/plain; charset=UTF-8 Content-Length: 0 curl -i -IHEAD http://localhost:9200/website/blog/124 HTTP/1.1 404 Not Found Content-Type: text/plain; charset=UTF-8 Content-Length: 0
  9. 9. Updating a Whole Document ● Documents in Elasticsearch are immutable; we cannot change them. Instead, if we need to update an existing document, we reindex or replace it, which we can do using the same index API PUT /website/blog/123 { "title": "My first blog entry", "text": "I am starting to get the hang of this...", "date": "2014/01/02" } { "_index": "website", "_type": "blog", "_id": "123", "_version": 2, "created": false }
  10. 10. Creating a New Document POST /website/blog/ { ... } PUT /website/blog/123?op_type=create { ... } PUT /website/blog/123/_create { ... } 1 2 3 PUT /website/blog/123?op_type=create { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } { "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]", "status": 409 }
  11. 11. Deleting a Document DELETE /website/blog/123 { "found": true, "_index": "website", "_type": "blog", "_id": "123", "_version": 3 } { "found": false, "_index": "website", "_type": "blog", "_id": "123", "_version": 1 } DELETE /website/blog/123
  12. 12. Dealing with Conflicts Consequence of no concurrency control
  13. 13. Optimistic Concurrency Control PUT /website/blog/1/_create { "title": "My first blog entry", "text": "Just trying this out..." } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out..." } } PUT /website/blog/1?version=1 { "title": "My first blog entry", "text": "Starting to get the hang of this..." } { "_index": "website", "_type": "blog", "_id": "1", "_version": 2, "created": false } 1 2 3
  14. 14. Using Versions from an External System PUT /website/blog/2?version=5&version_type=external { "title": "My first external blog entry", "text": "Starting to get the hang of this..." } { "_index": "website", "_type": "blog", "_id": "2", "_version": 5, "created": true } PUT /website/blog/2?version=10&version_type=external { "title": "My first external blog entry", "text": "This is a piece of cake..." } { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "created": false } PUT /website/blog/2?version=10&version_type=external { "title": "My first external blog entry", "text": "This is a piece of cake..." } { "error": "VersionConflictEngineException[[website][3] [blog][2]: version conflict, current [10], provided [10]]", "status": 409 } 1 2 3
  15. 15. Partial Updates to Documents POST /website/blog/1/_update { "doc" : { "tags" : [ "testing" ], "views": 0 } } { "_index": "website", "_type": "blog", "_id": "1", "_version": 3 } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 3, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 0, "tags": [ "testing" ] } } 1 2
  16. 16. Using Scripts to Make Partial Updates POST /website/blog/1/_update { "script" : "ctx._source.views+=1" } { "_index": "website", "_type": "blog", "_id": "1", "_version": 4 } POST /website/blog/1/_update { "script" : "ctx._source.tags+=new_tag", "params" : { "new_tag" : "search" } } { "_index": "website", "_type": "blog", "_id": "1", "_version": 5 } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 6, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 1, "tags": [ "testing", "search" ] } } 1 2 3
  17. 17. Using Scripts to Make Partial Updates POST /website/blog/1/_update { "script" : "ctx.op = ctx._source.views == count ? 'delete' : 'none'", "params" : { "count": 1 } } Delete a document based on its contents, by setting ctx.op to delete GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "found": false }
  18. 18. Updating a Document That May Not Yet Exist POST /website/pageviews/1/_update { "script" : "ctx._source.views+=1", "upsert": { "views": 1 } } { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1 } GET /website/pageviews/1 { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1, "found": true, "_source": { "views": 1 } }
  19. 19. Update and Conflicts POST /website/pageviews/1/_update?retry_on_conflict=5 { "script" : "ctx._source.views+=1", "upsert": { "views": 0 } } { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 2 "found": true, "_source": { "views": 2 } }
  20. 20. Retrieving Multiple Documents GET /_mget { "docs" : [ { "_index" : "website", "_type" : "blog", "_id" : 2 }, { "_index" : "website", "_type" : "pageviews", "_id" : 1, "_source": "views" } ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ] }
  21. 21. Retrieving Multiple Documents GET /website/blog/_mget { "docs" : [ { "_id" : 2 }, { "_type" : "pageviews", "_id" : 1 } ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ] }
  22. 22. Retrieving Multiple Documents GET /website/blog/_mget { "ids" : [ "2", "1" ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "blog", "_id": "1", "found": false } ] }
  23. 23. Cheaper in Bulk { action: { metadata }}n { request body }n { action: { metadata }}n { request body }n ... The bulk request body has the following format : POST /_bulk { "delete": { "_index": "website", "_type": "blog", "_id": "123" }} { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "My first blog post" } { "index": { "_index": "website", "_type": "blog" }} { "title": "My second blog post" } { "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} } { "doc" : {"title" : "My updated blog post"} } { "took": 4, "errors": false, "items": [ { "delete": { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "status": 404, "found": false } }, { "create": { "_index": "website", "_type": "blog", "_id": "123", "_version": 2, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVu4ZmPwPQAxVyMVtH", "_version": 1, "status": 201 } }, { "update": { "_index": "website", "_type": "blog", "_id": "123", "_version": 3, "status": 200 } } ] }
  24. 24. Cheaper in Bulk POST /_bulk { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "Cannot create - it already exists" } { "index": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "But we can update it" } { "took": 2, "errors": true, "items": [ { "create": { "_index": "website", "_type": "blog", "_id": "123", "status": 409, "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]" } }, { "index": { "_index": "website", "_type": "blog", "_id": "123", "_version": 4, "status": 200 } } ] }
  25. 25. Don’t Repeat Yourself POST /website/_bulk { "index": { "_type": "log" }} { "event": "User logged in" } { "took": 3, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVyqWVPwPQAxVyMV3_", "_version": 1, "status": 201 } } ] }
  26. 26. Don’t Repeat Yourself POST /website/log/_bulk { "index": {}} { "event": "User logged in" } { "index": { "_type": "blog" }} { "title": "Overriding the default type" } { "took": 2, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVzBQjPwPQAxVyMV4_", "_version": 1, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVzBQjPwPQAxVyMV5A", "_version": 1, "status": 201 } } ] }
  27. 27. How Big Is Too Big ?
  28. 28. Referensi ● ElasticSearch, The Definitive Guide, A Distrib uted Real-Time Search and Analytics Engine, Cl inton Gormely & Zachary Tong, O’Reilly

×