Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

5,439 views

Published on

A presentation from the New Media Inspiration 2013 conference (http://www.tuesday.cz/akce/new-media-inspiration-2013/) about using Elasticsearch's faceting features for realtime analytics of big data.

Published in: Technology
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,439
On SlideShare
0
From Embeds
0
Number of Embeds
174
Actions
Shares
0
Downloads
49
Comments
0
Likes
13
Embeds 0
No embeds

No notes for slide

Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

  1. Real time analyticsof big data with ElasticsearchKarel Minařík
  2. cets Fa ly tics SON AnaJ http://www.youtube.com/watch?v=-GftBySG99Q
  3. http://karmi.czhttp://elasticsearch.com Realtime Analytics With ElasticSearch
  4. Using a search engine for analytics?wat? Realtime Analytics With ElasticSearch
  5. HOW DOES SEARCH WORK?A collection of documents file_1.txt The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ... file_2.txt Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented   programming  language  ... file_3.txt "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
  6. HOW DOES SEARCH WORK?How do you search documents?File.read(file_1.txt).include?(ruby)File.read(file_2.txt).include?(ruby)...
  7. HOW DOES SEARCH WORK?The inverted indexTOKENS POSTINGS ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  8. HOW DOES SEARCH WORK?The inverted indexsearch  "ruby" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  9. HOW DOES SEARCH WORK?The inverted indexsearch  "song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  10. HOW DOES SEARCH WORK?The inverted indexsearch  "ruby  AND  song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  11. HOW DOES SEARCH WORK?The inverted indexTOKENS POSTINGS Statistics! ruby 3 file_1.txt file_2.txt file_3.txt pink 1 file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  12. http://elasticsearch.org
  13. ElasticSearch is an open source, scalable,distributed, cloud-ready, highly-available full-text search engine and database with powerfulaggregation features, communicating by JSONover RESTful HTTP, based on ApacheLucene. Realtime Analytics With ElasticSearch
  14. FACETS Faceted NavigationQueryFacets http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
  15. FACETSFaceted Navigation with Elasticsearchcurl  "http://localhost:9200/people/_search?pretty=true"  -­‐d  {    "query"  :  {        "match"  :  {  "name"  :  "John"} User query    },    "filter"  :  {        "terms"  :  {  "employer"  :  ["IBM"]  } “Checkboxes”    },    "facets"  :  {        "employer"  :  {            "terms"  :  { Facets                    "field"  :  "employer",                    "size"    :  3            } "facets"  :  {        }        "employer"  :  {    }            "missing"  :  0,}            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm", Response                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ]        }    }http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
  16. FACETSVisualizing the Facets "facets"  :  {        "employer"  :  {            "missing"  :  0,            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm",                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ] DEMO: http://bl.ocks.org/4571766        }    } d3.js ~ A Bar Chart, Part 1 http://mbostock.github.com/d3/tutorial/bar-1.html
  17. FACETSVisualizing the Facets
  18. FACETSVisualizing the Facets
  19. FACETSVisualizing the Facetshttp://demo.kibana.org
  20. Important Concepts‣ No batch orientation‣ No stats precomputation and caching‣ No predefined metrics or schemas‣ Combination of free text search, structured search, and facets‣ Scripting for performing ad–hoc analytics‣ Extendable: write your own facet types Realtime Analytics With ElasticSearch
  21. FACETSScriptingExtract and aggregate most popular domains from article URLscurl -X DELETE localhost:9200/demo-articlescurl -X POST localhost:9200/demo-articles -d {"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }curl -X PUT localhost:9200/demo-articles/a/1 -d {"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}curl -X PUT localhost:9200/demo-articles/a/2 -d {"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}curl -X PUT localhost:9200/demo-articles/a/3 -d {"title":"...","url":"http://some.blogger.com/about.html"}curl -X PUT localhost:9200/demo-articles/a/5 -d {"title":"...","url":"https://github.com/user/A"}curl -X PUT localhost:9200/demo-articles/a/5 -d {"title":"...","url":"http://github.com/user/B"}curl -X POST localhost:9200/demo-articles/_refreshcurl -X GET localhost:9200/demo-articles/_search/?search_type=count&pretty -d { "facets": { "popular-domains": { "terms": { "field" : "url", "script" : "term.replace(new RegExp("https?://"), "").split("/")[0]", "lang" : "javascript" } } } "facets"  :  {}        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  { Response                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }
  22. FACETSDemonstrationsExtract and aggregate most popular domains from article URLscurl -X DELETE localhost:9200/demo-articlescurl -X POST localhost:9200/demo-articles -d {"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }curl -X PUT localhost:9200/demo-articles/a/1 -d {"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}curl -X PUT localhost:9200/demo-articles/a/2 -d {"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}curl -X PUT localhost:9200/demo-articles/a/3 -d {"title":"...","url":"http://some.blogger.com/about.html"}curl -X PUT localhost:9200/demo-articles/a/5 -d {"title":"...","url":"https://github.com/user/A"}curl -X PUT localhost:9200/demo-articles/a/5 -d {"title":"...","url":"http://github.com/user/B"}curl -X POST localhost:9200/demo-articles/_refreshcurl -X GET localhost:9200/demo-articles/_search/?search_type=count&pretty -d { "facets": { "popular-domains": { "terms": { "field" : "url", "script" : "term.replace(new RegExp("https?://"), "").split("/")[0]", "lang" : "javascript" }} } } Demo "facets"  :  {        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  { Response                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }
  23. Thanks! d

×