SlideShare a Scribd company logo
1 of 23
Download to read offline
Real time analytics
of big data with Elasticsearch




Karel Minařík
cets
                                      Fa




                                      ly tics
  SON                        Ana
J

 http://www.youtube.com/watch?v=-GftBySG99Q
http://karmi.cz
http://elasticsearch.com


                  Realtime Analytics With ElasticSearch
Using a search engine for analytics?


wat?

                              Realtime Analytics With ElasticSearch
HOW DOES SEARCH WORK?

A collection of documents




      file_1.txt
      The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ...


      file_2.txt
      Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented  
      programming  language  ...

      file_3.txt
      "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
HOW DOES SEARCH WORK?

How do you search documents?




File.read('file_1.txt').include?('ruby')
File.read('file_2.txt').include?('ruby')
...
HOW DOES SEARCH WORK?

The inverted index

TOKENS                         POSTINGS



 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "ruby"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "song"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

search  "ruby  AND  song"

 ruby                           file_1.txt        file_2.txt          file_3.txt
 pink                           file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
HOW DOES SEARCH WORK?

The inverted index

TOKENS                         POSTINGS
                              Statistics!


 ruby    3                      file_1.txt        file_2.txt          file_3.txt
 pink    1                      file_1.txt
 gemstone                       file_1.txt

 dynamic                                         file_2.txt
 reflective                                      file_2.txt
 programming                                     file_2.txt

 song                                                                 file_3.txt
 english                                                              file_3.txt
 rock                                                                 file_3.txt

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
http://elasticsearch.org
ElasticSearch is an open source, scalable,
distributed, cloud-ready, highly-available full-
text search engine and database with powerful
aggregation features, communicating by JSON
over RESTful HTTP, based on Apache
Lucene.


                                 Realtime Analytics With ElasticSearch
FACETS

    Faceted Navigation



Query




Facets




                         http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
FACETS

Faceted Navigation with Elasticsearch

curl  "http://localhost:9200/people/_search?pretty=true"  -­‐d  '
{
    "query"  :  {
        "match"  :  {  "name"  :  "John"}                                                        User query
    },
    "filter"  :  {
        "terms"  :  {  "employer"  :  ["IBM"]  }                                                 “Checkboxes”
    },
    "facets"  :  {
        "employer"  :  {
            "terms"  :  {                                                                        Facets
                    "field"  :  "employer",
                    "size"    :  3
            }                                    "facets"  :  {
        }                                                "employer"  :  {
    }                                                        "missing"  :  0,
}'
                                                                     "total"  :  10,
                                                                     "other"  :  3,
                                                                     "terms"  :  [  {
                                                                         "term"  :  "ibm",
                           Response                                      "count"  :  3
                                                                     },  {
                                                                         "term"  :  "twitter",
                                                                         "count"  :  2
                                                                     },  {
                                                                         "term"  :  "apple",
                                                                         "count"  :  2
                                                                     }  ]
                                                                 }
                                                             }

http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
FACETS

Visualizing the Facets



  "facets"  :  {
          "employer"  :  {
              "missing"  :  0,
              "total"  :  10,
              "other"  :  3,
              "terms"  :  [  {
                  "term"  :  "ibm",
                  "count"  :  3
              },  {
                  "term"  :  "twitter",
                  "count"  :  2
              },  {
                  "term"  :  "apple",
                  "count"  :  2
              }  ]                                    DEMO: http://bl.ocks.org/4571766
          }
      }




  d3.js ~ A Bar Chart, Part 1
  http://mbostock.github.com/d3/tutorial/bar-1.html
FACETS

Visualizing the Facets
FACETS

Visualizing the Facets
FACETS

Visualizing the Facets




http://demo.kibana.org
Important Concepts
‣ No batch orientation
‣ No stats precomputation and caching
‣ No predefined metrics or schemas

‣ Combination of free text search, structured
  search, and facets
‣ Scripting for performing ad–hoc analytics
‣ Extendable: write your own facet types


                                 Realtime Analytics With ElasticSearch
FACETS

Scripting
Extract and aggregate most popular domains from article URLs
curl -X DELETE localhost:9200/demo-articles
curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'


curl         -X   PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'
curl         -X   POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{
  "facets": {
    "popular-domains": {
      "terms": {
        "field" :    "url",
                  "script" :   "term.replace(new            RegExp("https?://"), "").split("/")[0]",
                  "lang"   :   "javascript"
              }
         }
     }
                                                             "facets"  :  {
}'
                                                                     "popular-­‐domains"  :  {
                                                                         //  ...
                                                                         "terms"  :  [  {
                               Response                                      "term"  :  "some.blogger.com",  "count"  :  3
                                                                         },  {
                                                                             "term"  :  "github.com",  "count"  :  1
                                                                         }  ]
                                                                     }
                                                                 }
FACETS

Demonstrations
Extract and aggregate most popular domains from article URLs
curl -X DELETE localhost:9200/demo-articles
curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'


curl         -X   PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'
curl         -X   PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'
curl         -X   POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{
  "facets": {
    "popular-domains": {
      "terms": {
        "field" :    "url",
                  "script" :   "term.replace(new            RegExp("https?://"), "").split("/")[0]",
                  "lang"   :   "javascript"
              }



}'
     }
         }
                                                                        Demo
                                                             "facets"  :  {
                                                                     "popular-­‐domains"  :  {
                                                                         //  ...
                                                                         "terms"  :  [  {
                               Response                                      "term"  :  "some.blogger.com",  "count"  :  3
                                                                         },  {
                                                                             "term"  :  "github.com",  "count"  :  1
                                                                         }  ]
                                                                     }
                                                                 }
Thanks!
  d

More Related Content

Viewers also liked

Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analytics
brock55
 

Viewers also liked (16)

Cgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analyticsCgc2 cdn gamingsummit-real-time-customer-analytics
Cgc2 cdn gamingsummit-real-time-customer-analytics
 
Real Time Recommendation System using Kiji
Real Time Recommendation System using KijiReal Time Recommendation System using Kiji
Real Time Recommendation System using Kiji
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)
 
Real-Time Personalization
Real-Time PersonalizationReal-Time Personalization
Real-Time Personalization
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
Big Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businessesBig Data Predictive Analytics for Retail businesses
Big Data Predictive Analytics for Retail businesses
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
Customer Journey Analytics and Big Data
Customer Journey Analytics and Big DataCustomer Journey Analytics and Big Data
Customer Journey Analytics and Big Data
 

Similar to Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Similar to Realtime Analytics With Elasticsearch [New Media Inspiration 2013] (20)

Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your dataset
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...
 
ElasticSearch with Tire
ElasticSearch with TireElasticSearch with Tire
ElasticSearch with Tire
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
Linked Data on Rails
Linked Data on RailsLinked Data on Rails
Linked Data on Rails
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Mining legal texts with Python
Mining legal texts with PythonMining legal texts with Python
Mining legal texts with Python
 
How to put an annotation in html
How to put an annotation in htmlHow to put an annotation in html
How to put an annotation in html
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
First steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic webFirst steps towards publishing library data on the semantic web
First steps towards publishing library data on the semantic web
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
RESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatialRESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatial
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Google code search
Google code searchGoogle code search
Google code search
 

More from Karel Minarik

Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
Karel Minarik
 
Redis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesRedis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational Databases
Karel Minarik
 
Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)
Karel Minarik
 

More from Karel Minarik (18)

Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]Vizualizace dat a D3.js [EUROPEN 2014]
Vizualizace dat a D3.js [EUROPEN 2014]
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
 
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
Shell's Kitchen: Infrastructure As Code (Webexpo 2012)
 
Redis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational DatabasesRedis — The AK-47 of Post-relational Databases
Redis — The AK-47 of Post-relational Databases
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
 
Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)Spoiling The Youth With Ruby (Euruko 2010)
Spoiling The Youth With Ruby (Euruko 2010)
 
Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)Verzovani kodu s Gitem (Karel Minarik)
Verzovani kodu s Gitem (Karel Minarik)
 
Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]Představení Ruby on Rails [Junior Internet]
Představení Ruby on Rails [Junior Internet]
 
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
Efektivni vyvoj webovych aplikaci v Ruby on Rails (Webexpo)
 
Úvod do Ruby on Rails
Úvod do Ruby on RailsÚvod do Ruby on Rails
Úvod do Ruby on Rails
 
Úvod do programování 7
Úvod do programování 7Úvod do programování 7
Úvod do programování 7
 
Úvod do programování 6
Úvod do programování 6Úvod do programování 6
Úvod do programování 6
 
Úvod do programování 5
Úvod do programování 5Úvod do programování 5
Úvod do programování 5
 
Úvod do programování 4
Úvod do programování 4Úvod do programování 4
Úvod do programování 4
 
Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)Úvod do programování 3 (to be continued)
Úvod do programování 3 (to be continued)
 
Historie programovacích jazyků
Historie programovacích jazykůHistorie programovacích jazyků
Historie programovacích jazyků
 
Úvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra strojeÚvod do programování aneb Do nitra stroje
Úvod do programování aneb Do nitra stroje
 
Interaktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzoryInteraktivita, originalita a návrhové vzory
Interaktivita, originalita a návrhové vzory
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

  • 1. Real time analytics of big data with Elasticsearch Karel Minařík
  • 2. cets Fa ly tics SON Ana J http://www.youtube.com/watch?v=-GftBySG99Q
  • 3. http://karmi.cz http://elasticsearch.com Realtime Analytics With ElasticSearch
  • 4. Using a search engine for analytics? wat? Realtime Analytics With ElasticSearch
  • 5. HOW DOES SEARCH WORK? A collection of documents file_1.txt The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ... file_2.txt Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented   programming  language  ... file_3.txt "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
  • 6. HOW DOES SEARCH WORK? How do you search documents? File.read('file_1.txt').include?('ruby') File.read('file_2.txt').include?('ruby') ...
  • 7. HOW DOES SEARCH WORK? The inverted index TOKENS POSTINGS ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 8. HOW DOES SEARCH WORK? The inverted index search  "ruby" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 9. HOW DOES SEARCH WORK? The inverted index search  "song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 10. HOW DOES SEARCH WORK? The inverted index search  "ruby  AND  song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 11. HOW DOES SEARCH WORK? The inverted index TOKENS POSTINGS Statistics! ruby 3 file_1.txt file_2.txt file_3.txt pink 1 file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txt http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  • 13. ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full- text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene. Realtime Analytics With ElasticSearch
  • 14. FACETS Faceted Navigation Query Facets http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
  • 15. FACETS Faceted Navigation with Elasticsearch curl  "http://localhost:9200/people/_search?pretty=true"  -­‐d  ' {    "query"  :  {        "match"  :  {  "name"  :  "John"} User query    },    "filter"  :  {        "terms"  :  {  "employer"  :  ["IBM"]  } “Checkboxes”    },    "facets"  :  {        "employer"  :  {            "terms"  :  { Facets                    "field"  :  "employer",                    "size"    :  3            } "facets"  :  {        }        "employer"  :  {    }            "missing"  :  0, }'            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm", Response                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ]        }    } http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
  • 16. FACETS Visualizing the Facets "facets"  :  {        "employer"  :  {            "missing"  :  0,            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm",                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ] DEMO: http://bl.ocks.org/4571766        }    } d3.js ~ A Bar Chart, Part 1 http://mbostock.github.com/d3/tutorial/bar-1.html
  • 20. Important Concepts ‣ No batch orientation ‣ No stats precomputation and caching ‣ No predefined metrics or schemas ‣ Combination of free text search, structured search, and facets ‣ Scripting for performing ad–hoc analytics ‣ Extendable: write your own facet types Realtime Analytics With ElasticSearch
  • 21. FACETS Scripting Extract and aggregate most popular domains from article URLs curl -X DELETE localhost:9200/demo-articles curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }' curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}' curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}' curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}' curl -X POST localhost:9200/demo-articles/_refresh curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url", "script" : "term.replace(new RegExp("https?://"), "").split("/")[0]", "lang" : "javascript" } } } "facets"  :  { }'        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  { Response                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }
  • 22. FACETS Demonstrations Extract and aggregate most popular domains from article URLs curl -X DELETE localhost:9200/demo-articles curl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }' curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}' curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}' curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}' curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}' curl -X POST localhost:9200/demo-articles/_refresh curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url", "script" : "term.replace(new RegExp("https?://"), "").split("/")[0]", "lang" : "javascript" } }' } } Demo "facets"  :  {        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  { Response                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }