SlideShare a Scribd company logo
1 of 23
Download to read offline
Real time analytics
of big data with Elasticsearch




Karel Minařík
cets
                                      Fa




                                      ly tics
  SON                        Ana
J

 http://www.youtube.com/watch?v=-GftBySG99Q
http://karmi.cz
http://elasticsearch.com


                  Realtime Analytics With ElasticSearch
Using a search engine for analytics?


wat?

                              Realtime Analytics With ElasticSearch
HOW DOES SEARCH WORK?

A collection of documents




      file_1.txt
      The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ...


      file_2.txt
      Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented  
      programming  language  ...

      file_3.txt
      "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
HOW DOES SEARCH WORK?

How do you search documents?




File.read('file_1.txt').include?('ruby')
File.read('file_2.txt').include?('ruby')
...
HOW DOES SEARCH WORK?

The inverted index

TOKENS                         POSTINGS



 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "ruby"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "song"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "ruby  AND  song"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

TOKENS                         POSTINGS
                              Statistics!


 ruby    3                      file_1.txt        file_2.txt          file_3.txt
 pink    1                      file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
http://elasticsearch.org
ElasticSearch is an open source, scalable,
distributed, cloud-ready, highly-available full-
text search engine and database with powerful
aggregation features, communicating by JSON
over RESTful HTTP, based on Apache
Lucene.


                                 Realtime Analytics With ElasticSearch
FACETS

    Faceted Navigation



Query




Facets




                         http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
FACETS

Faceted Navigation with Elasticsearch

curl  "http://localhost:9200/people/_search?pretty=true"  -­‐d  '
{
    "query"  :  {
        "match"  :  {  "name"  :  "John"}                                                        User query
    },
    "filter"  :  {
        "terms"  :  {  "employer"  :  ["IBM"]  }                                                 “Checkboxes”
    },
    "facets"  :  {
        "employer"  :  {
            "terms"  :  {                                                                        Facets
                    "field"  :  "employer",
                    "size"    :  3
            }                                    "facets"  :  {
        }                                                "employer"  :  {
    }                                                        "missing"  :  0,
}'
                                                                     "total"  :  10,
                                                                     "other"  :  3,
                                                                     "terms"  :  [  {
                                                                         "term"  :  "ibm",
                           Response                                      "count"  :  3
                                                                     },  {
                                                                         "term"  :  "twitter",
                                                                         "count"  :  2
                                                                     },  {
                                                                         "term"  :  "apple",
                                                                         "count"  :  2
                                                                     }  ]
                                                                 }
                                                             }

http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
FACETS

Visualizing the Facets



  "facets"  :  {
          "employer"  :  {
              "missing"  :  0,
              "total"  :  10,
              "other"  :  3,
              "terms"  :  [  {
                  "term"  :  "ibm",
                  "count"  :  3
              },  {
                  "term"  :  "twitter",
                  "count"  :  2
              },  {
                  "term"  :  "apple",
                  "count"  :  2
              }  ]                                    DEMO: http://bl.ocks.org/4571766
          }
      }




  d3.js ~ A Bar Chart, Part 1
  http://mbostock.github.com/d3/tutorial/bar-1.html
FACETS

Visualizing the Facets
FACETS

Visualizing the Facets
FACETS

Visualizing the Facets




http://demo.kibana.org
Important Concepts
‣ No batch orientation
‣ No stats precomputation and caching
‣ No predefined metrics or schemas

‣ Combination of free text search, structured
  search, and facets
‣ Scripting for performing ad–hoc analytics
‣ Extendable: write your own facet types


                                 Realtime Analytics With ElasticSearch
FACETS

Scripting
Extract and aggregate most popular domains from article URLs
curl -X DELETE localhost:9200/demo-articles
curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'


curl         -X   PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'
curl         -X   POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{
  "facets": {
    "popular-domains": {
      "terms": {
        "field" :    "url",
                  "script" :   "term.replace(new            RegExp("https?://"), "").split("/")[0]",
                  "lang"   :   "javascript"
              }
         }
     }
                                                             "facets"  :  {
}'
                                                                     "popular-­‐domains"  :  {
                                                                         //  ...
                                                                         "terms"  :  [  {
                               Response                                      "term"  :  "some.blogger.com",  "count"  :  3
                                                                         },  {
                                                                             "term"  :  "github.com",  "count"  :  1
                                                                         }  ]
                                                                     }
                                                                 }
FACETS

Demonstrations
Extract and aggregate most popular domains from article URLs
curl -X DELETE localhost:9200/demo-articles
curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'


curl         -X   PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'
curl         -X   POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{
  "facets": {
    "popular-domains": {
      "terms": {
        "field" :    "url",
                  "script" :   "term.replace(new            RegExp("https?://"), "").split("/")[0]",
                  "lang"   :   "javascript"
              }



}'
     }
         }
                                                                        Demo
                                                             "facets"  :  {
                                                                     "popular-­‐domains"  :  {
                                                                         //  ...
                                                                         "terms"  :  [  {
                               Response                                      "term"  :  "some.blogger.com",  "count"  :  3
                                                                         },  {
                                                                             "term"  :  "github.com",  "count"  :  1
                                                                         }  ]
                                                                     }
                                                                 }
Thanks!
  d

More Related Content

Viewers also liked

Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsbrock55
 
Real Time Recommendation System using Kiji
Real Time Recommendation System using KijiReal Time Recommendation System using Kiji
Real Time Recommendation System using KijiDaqing Zhao
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comDaqing Zhao
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Karel Minarik
 
Real-Time Personalization
Real-Time PersonalizationReal-Time Personalization
Real-Time PersonalizationRichard Veryard
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoSri Ambati
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
 
Big Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businessesBig Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businessesGopalakrishna Palem
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...DataStax
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data AnalyticsAmazon Web Services
 

Viewers also liked (16)

Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analytics
 
Real Time Recommendation System using Kiji
Real Time Recommendation System using KijiReal Time Recommendation System using Kiji
Real Time Recommendation System using Kiji
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)
 
Real-Time Personalization
Real-Time PersonalizationReal-Time Personalization
Real-Time Personalization
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
Big Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businessesBig Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businesses
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
Customer Journey Analytics and Big Data
Customer Journey Analytics and Big DataCustomer Journey Analytics and Big Data
Customer Journey Analytics and Big Data
 

Similar to Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Karel Minarik
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your datasetTuri, Inc.
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Gasperi Jerome
 
ElasticSearch with Tire
ElasticSearch with TireElasticSearch with Tire
ElasticSearch with TireDavid Yun
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1medcl
 
Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016David Erickson
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)net2-project
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012Jimmy Lai
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Djangotow21
 
How to put an annotation in html
How to put an annotation in htmlHow to put an annotation in html
How to put an annotation in htmlSTIinnsbruck
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
First steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic webFirst steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic webhorvadam
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
RESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatialRESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatialGasperi Jerome
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 

Similar to Realtime Analytics With Elasticsearch [New Media Inspiration 2013] (20)

Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your dataset
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...
 
ElasticSearch with Tire
ElasticSearch with TireElasticSearch with Tire
ElasticSearch with Tire
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
Linked Data on Rails
Linked Data on RailsLinked Data on Rails
Linked Data on Rails
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Mining legal texts with Python
Mining legal texts with PythonMining legal texts with Python
Mining legal texts with Python
 
How to put an annotation in html
How to put an annotation in htmlHow to put an annotation in html
How to put an annotation in html
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
First steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic webFirst steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic web
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
RESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatialRESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatial
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Google code search
Google code searchGoogle code search
Google code search
 

More from Karel Minarik

Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]Karel Minarik
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Karel Minarik
 
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)Karel Minarik
 
Redis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesRedis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesKarel Minarik
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the WebKarel Minarik
 
Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)Karel Minarik
 
Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)Karel Minarik
 
Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]Karel Minarik
 
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)Karel Minarik
 
Úvod do Ruby on Rails
Úvod do Ruby on RailsÚvod do Ruby on Rails
Úvod do Ruby on RailsKarel Minarik
 
Úvod do programování 7
Úvod do programování 7Úvod do programování 7
Úvod do programování 7Karel Minarik
 
Úvod do programování 6
Úvod do programování 6Úvod do programování 6
Úvod do programování 6Karel Minarik
 
Úvod do programování 5
Úvod do programování 5Úvod do programování 5
Úvod do programování 5Karel Minarik
 
Úvod do programování 4
Úvod do programování 4Úvod do programování 4
Úvod do programování 4Karel Minarik
 
Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Karel Minarik
 
Historie programovacích jazyků
Historie programovacích jazykůHistorie programovacích jazyků
Historie programovacích jazykůKarel Minarik
 
Úvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra strojeÚvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra strojeKarel Minarik
 
Interaktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzoryInteraktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzoryKarel Minarik
 

More from Karel Minarik (18)

Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
 
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
 
Redis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesRedis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational Databases
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
 
Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)
 
Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)
 
Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]
 
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
 
Úvod do Ruby on Rails
Úvod do Ruby on RailsÚvod do Ruby on Rails
Úvod do Ruby on Rails
 
Úvod do programování 7
Úvod do programování 7Úvod do programování 7
Úvod do programování 7
 
Úvod do programování 6
Úvod do programování 6Úvod do programování 6
Úvod do programování 6
 
Úvod do programování 5
Úvod do programování 5Úvod do programování 5
Úvod do programování 5
 
Úvod do programování 4
Úvod do programování 4Úvod do programování 4
Úvod do programování 4
 
Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)
 
Historie programovacích jazyků
Historie programovacích jazykůHistorie programovacích jazyků
Historie programovacích jazyků
 
Úvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra strojeÚvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra stroje
 
Interaktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzoryInteraktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzory
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

  • 1. Real time analytics of big data with Elasticsearch Karel Minařík
  • 2. cets Fa ly tics SON Ana J http://www.youtube.com/watch?v=-GftBySG99Q
  • 3. http://karmi.cz http://elasticsearch.com Realtime Analytics With ElasticSearch
  • 4. Using a search engine for analytics? wat? Realtime Analytics With ElasticSearch
  • 5. HOW DOES SEARCH WORK? A collection of documents file_1.txt The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ... file_2.txt Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented   programming  language  ... file_3.txt "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
  • 6. HOW DOES SEARCH WORK? How do you search documents? File.read('file_1.txt').include?('ruby') File.read('file_2.txt').include?('ruby') ...
  • 7. HOW DOES SEARCH WORK? The inverted index TOKENS POSTINGS ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 8. HOW DOES SEARCH WORK? The inverted index search  "ruby" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 9. HOW DOES SEARCH WORK? The inverted index search  "song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 10. HOW DOES SEARCH WORK? The inverted index search  "ruby  AND  song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 11. HOW DOES SEARCH WORK? The inverted index TOKENS POSTINGS Statistics! ruby 3 file_1.txt file_2.txt file_3.txt pink 1 file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 13. ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full- text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene. Realtime Analytics With ElasticSearch
  • 14. FACETS Faceted Navigation Query Facets http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
  • 15. FACETS Faceted Navigation with Elasticsearch curl  "http://localhost:9200/people/_search?pretty=true"  -­‐d  ' {    "query"  :  {        "match"  :  {  "name"  :  "John"} User query    },    "filter"  :  {        "terms"  :  {  "employer"  :  ["IBM"]  } “Checkboxes”    },    "facets"  :  {        "employer"  :  {            "terms"  :  { Facets                    "field"  :  "employer",                    "size"    :  3            } "facets"  :  {        }        "employer"  :  {    }            "missing"  :  0, }'            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm", Response                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ]        }    } http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
  • 16. FACETS Visualizing the Facets "facets"  :  {        "employer"  :  {            "missing"  :  0,            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm",                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ] DEMO: http://bl.ocks.org/4571766        }    } d3.js ~ A Bar Chart, Part 1 http://mbostock.github.com/d3/tutorial/bar-1.html
  • 20. Important Concepts ‣ No batch orientation ‣ No stats precomputation and caching ‣ No predefined metrics or schemas ‣ Combination of free text search, structured search, and facets ‣ Scripting for performing ad–hoc analytics ‣ Extendable: write your own facet types Realtime Analytics With ElasticSearch
  • 21. FACETS Scripting Extract and aggregate most popular domains from article URLs curl -X DELETE localhost:9200/demo-articles curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }' curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}' curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}' curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}' curl -X POST localhost:9200/demo-articles/_refresh curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url", "script" : "term.replace(new RegExp("https?://"), "").split("/")[0]", "lang" : "javascript" } } } "facets"  :  { }'        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  { Response                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }
  • 22. FACETS Demonstrations Extract and aggregate most popular domains from article URLs curl -X DELETE localhost:9200/demo-articles curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }' curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}' curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}' curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}' curl -X POST localhost:9200/demo-articles/_refresh curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url", "script" : "term.replace(new RegExp("https?://"), "").split("/")[0]", "lang" : "javascript" } }' } } Demo "facets"  :  {        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  { Response                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }