SlideShare a Scribd company logo
Scaling Analytics with
     elasticsearch
      Dan Noble
      @dwnoble
Background
• Technologist at The HumanGeo
• We use elasticsearch to build social media
  analysis tools
• 100MM documents indexed
• 600GB+ index size
• Author of Python elasticsearch driver “rawes”
  https://github.com/humangeo/rawes
Overview
• What is elasticsearch?
• Scaling with elasticsearch
• How can I use elasticsearch to help with
  analytics?
• Use Case: Social Media Analytics
What is elasticsearch?
Search Engine
•   Open source
•   Distributed
•   Automatic failover
•   Crazy fast
Search Engine
•   Actively maintained
•   REST API
•   JSON messages
•   Lucene based
Search
                  Elasticsearch “Cluster”

                           Host


                      Index: Articles




• Simple case: one host
• One index containing a set of articles
Distributed Search
                                Elasticsearch “Cluster”

                    Host                                     Host


                 Articles (a)                             Articles (b)




• Too much data?
• Add another host
• Indices can be broken up into “shards” and live on different
  machines
Redundancy
                          Elasticsearch Cluster

              Host                                   Host

           Articles (a)                           Articles (b)

           Articles (b)                           Articles (a)




• Shards can be replicated to improve
  availability
Node Auto Discovery
                       Elasticsearch Cluster

           Host                Host               Host

        Articles (a)        Articles (b)       Articles (b)


        Articles (b)        Articles (a)       Articles (a)




• Say we add a third host
• elasticsearch will automatically start moving
  shards to this new host to distribute load
Failover
                          Elasticsearch Cluster

             Host                 Host               Host

          Articles (a)         Articles (b)       Articles (b)


          Articles (b)         Articles (a)       Articles (a)




• Say a host goes down
• Shards on that host are no longer available for search
• Elasticsearch automatically rebuilds these two shards on other
  hosts
Querying
                    Elasticsearch Cluster

        Host                Host                Host

     Articles (a)        Articles (b)        Articles (b)


                                             Articles(a)



                    Query: “Barack Obama”

Can query against          Client
                                            Search for articles
    any host
                          (Web
                        Application)         Send request to
                                              other shards if
                                                 needed
REST API
• JSON query syntax
• Developer friendly
• Easy to get started
Python Example
import rawes
es = rawes.Elastic('elastic-00:9200')

es.get('articles/_search', data={
   "query": {
     "filtered" : {
         "query" : {
            "query_string" : {
                       "query" : "Barack Obama"
            }
         }
     }
   }
})
Community
Elasticsearch Summary
•   Scales horizontally
•   Redundancy
•   Configures itself automatically
•   Developer friendly
Analytics and elasticsearch
•   Date Histograms
•   Statistical facets
•   Geospatial queries
•   All with arbitrary search parameters
•   Again: Fast
Use Case: Social Media Analysis
• Use social media APIs to search for data on a
  topic of interest
• 100MM documents indexed
• Sentiment analysis
• Location extraction (“Geotagging”)
Sample Document
es.post('articles/facebook', data={
   ”date": "2012-09-01 08:37:55",
   "tags": {
       "sentiment": {
           "positive": 0.36,
           "negative": 0.10
       }
       "geotags": [{
           "term" : "Cairo",
           "location" : "30.0566,31.2262”,
           “type” : “geo_point”
       }],
       "search_terms": [
           "Mohamed Morsi"
       ]
    },
   "item": {
       "publisher: "Facebook"
       "source_domain": "www.facebook.com",
       "author": "James Smith",
       "source_url": "http://www.facebook.com/5551231234/posts/414141414141",
       "content_text": "Mohamed Morsi visits Iran for first time since 1979 ....",
       "title": "James Smith posted a note to Facebook",
       "author_url: "http://www.facebook.com/profile.php?id=5551231234"
   }
})
Analytical Queries
Date Histogram for Sentiment
es.get('articles/_search', data=
{
   "query" : {
      "query_string" : {
        "query" : "Mohamed Morsi"
      }
   },
   "facets" : {
      "sentiment_histogram" : {
        "date_histogram" : {
           "key_field" : "date_of_information.$date",
           "value_field" : "tags.sentiment.positive",
           "interval" : "day"
        }
      }
   }
})
Date Histogram for Sentiment
Statistical Facet for Sentiment: Query
es.get('articles/_search', data={
   "query" : {
      "query_string" : {
        "query" : "Mohamed Morsi"
      }
   },
   "facets" : {
      "sentiment_stats" : {
        "statistical" : {
           "field" : "tags.sentiment.positive"
        }
      }
   }
})
Statistical Facet for Sentiment: Result
{
    "facets": {
       "sentiment_stats": {
          "_type": "statistical",
          "count": 8825,
          "max": 0.375,
          "mean": 0.008503991588291782,
          "min": 0.0,
          "std_deviation": 0.021251077265305472,
          "sum_of_squares": 4.623648343200283,
          "total": 75.04772576667497,
          "variance": 0.00045160828493598306
       }
    },
    "hits": {
       "hits": [],
       "max_score": 1.1120162,
       "total": 8825
    },
    "took": 60
}
Top Keywords
es.get('articles/_search', data={
   "query" : {
      "match_all" : {}
   },
   "facets" : {
      "search_terms" : {
        "terms" : {
           "field" : "tags.search_terms",
           "size" : 3
        }
      }
   }
})
Top Search Terms
Geospatial search
es.get('articles/_search', data={
   "query" : {
     "filtered" : {
         "filter" : {
             "geo_distance" : {
             "distance" : ”20km",
             "tags.geotags.location" : {
                    "lat" : 30,
                    "lon" : 31
                }
             }
         }
     }
   }
})
Questions

More Related Content

What's hot

Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
foundsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Ricardo Peres
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
Roy Russo
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
George Stathis
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
Roopendra Vishwakarma
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solrmacrochen
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
Elasticsearch for Data Analytics
Elasticsearch for Data AnalyticsElasticsearch for Data Analytics
Elasticsearch for Data Analytics
Felipe
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Sperasoft
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Jason Austin
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
Anurag Patel
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
Clifford James
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Ruslan Zavacky
 
Elastic search
Elastic searchElastic search
Elastic search
NexThoughts Technologies
 

What's hot (19)

Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
Elasticsearch for Data Analytics
Elasticsearch for Data AnalyticsElasticsearch for Data Analytics
Elasticsearch for Data Analytics
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Elastic search
Elastic searchElastic search
Elastic search
 

Viewers also liked

Présentation de la solution Strategeex
Présentation de la solution StrategeexPrésentation de la solution Strategeex
Présentation de la solution Strategeex
Visiativ
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
Codemotion
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with Elasticsearch
Aleksander Stensby
 
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "
Analyse des Sentiments  -cas twitter- "Opinion Detection with Machine Lerning "Analyse des Sentiments  -cas twitter- "Opinion Detection with Machine Lerning "
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "
Soumia Elyakote HERMA
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 

Viewers also liked (6)

Presentation
PresentationPresentation
Presentation
 
Présentation de la solution Strategeex
Présentation de la solution StrategeexPrésentation de la solution Strategeex
Présentation de la solution Strategeex
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with Elasticsearch
 
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "
Analyse des Sentiments  -cas twitter- "Opinion Detection with Machine Lerning "Analyse des Sentiments  -cas twitter- "Opinion Detection with Machine Lerning "
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 

Similar to Scaling Analytics with elasticsearch

Extensible RESTful Applications with Apache TinkerPop
Extensible RESTful Applications with Apache TinkerPopExtensible RESTful Applications with Apache TinkerPop
Extensible RESTful Applications with Apache TinkerPop
Varun Ganesh
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
Amazon Web Services
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
Antonio Peric-Mazar
 
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
kristgen
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
MongoDB
 
Elasticsearch as a Database?
Elasticsearch as a Database?Elasticsearch as a Database?
Elasticsearch as a Database?
Amazon Web Services
 
MongoDB
MongoDBMongoDB
Elasticsearch as a Database?
Elasticsearch as a Database?Elasticsearch as a Database?
Elasticsearch as a Database?
Amazon Web Services
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
Enterprise Search Warsaw Meetup
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Sean Laurent
 
ElasticSearch: Найдется все... и быстро!
ElasticSearch: Найдется все... и быстро!ElasticSearch: Найдется все... и быстро!
ElasticSearch: Найдется все... и быстро!
Alexander Byndyu
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
Amazon Web Services
 
MongoDB at ZPUGDC
MongoDB at ZPUGDCMongoDB at ZPUGDC
MongoDB at ZPUGDC
Mike Dirolf
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lamper
medcl
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
GeeksLab Odessa
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-intro
Shaoning Pan
 
Elasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics engineElasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics engine
gautam kumar
 
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar SeriesLog Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
Amazon Web Services
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
Roy Russo
 

Similar to Scaling Analytics with elasticsearch (20)

Extensible RESTful Applications with Apache TinkerPop
Extensible RESTful Applications with Apache TinkerPopExtensible RESTful Applications with Apache TinkerPop
Extensible RESTful Applications with Apache TinkerPop
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
Elasticsearch as a Database?
Elasticsearch as a Database?Elasticsearch as a Database?
Elasticsearch as a Database?
 
MongoDB
MongoDBMongoDB
MongoDB
 
Elasticsearch as a Database?
Elasticsearch as a Database?Elasticsearch as a Database?
Elasticsearch as a Database?
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
ElasticSearch: Найдется все... и быстро!
ElasticSearch: Найдется все... и быстро!ElasticSearch: Найдется все... и быстро!
ElasticSearch: Найдется все... и быстро!
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
 
MongoDB at ZPUGDC
MongoDB at ZPUGDCMongoDB at ZPUGDC
MongoDB at ZPUGDC
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lamper
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-intro
 
Elasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics engineElasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics engine
 
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar SeriesLog Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 

Recently uploaded

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 

Recently uploaded (20)

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 

Scaling Analytics with elasticsearch

  • 1. Scaling Analytics with elasticsearch Dan Noble @dwnoble
  • 2. Background • Technologist at The HumanGeo • We use elasticsearch to build social media analysis tools • 100MM documents indexed • 600GB+ index size • Author of Python elasticsearch driver “rawes” https://github.com/humangeo/rawes
  • 3. Overview • What is elasticsearch? • Scaling with elasticsearch • How can I use elasticsearch to help with analytics? • Use Case: Social Media Analytics
  • 5. Search Engine • Open source • Distributed • Automatic failover • Crazy fast
  • 6. Search Engine • Actively maintained • REST API • JSON messages • Lucene based
  • 7. Search Elasticsearch “Cluster” Host Index: Articles • Simple case: one host • One index containing a set of articles
  • 8. Distributed Search Elasticsearch “Cluster” Host Host Articles (a) Articles (b) • Too much data? • Add another host • Indices can be broken up into “shards” and live on different machines
  • 9. Redundancy Elasticsearch Cluster Host Host Articles (a) Articles (b) Articles (b) Articles (a) • Shards can be replicated to improve availability
  • 10. Node Auto Discovery Elasticsearch Cluster Host Host Host Articles (a) Articles (b) Articles (b) Articles (b) Articles (a) Articles (a) • Say we add a third host • elasticsearch will automatically start moving shards to this new host to distribute load
  • 11. Failover Elasticsearch Cluster Host Host Host Articles (a) Articles (b) Articles (b) Articles (b) Articles (a) Articles (a) • Say a host goes down • Shards on that host are no longer available for search • Elasticsearch automatically rebuilds these two shards on other hosts
  • 12. Querying Elasticsearch Cluster Host Host Host Articles (a) Articles (b) Articles (b) Articles(a) Query: “Barack Obama” Can query against Client Search for articles any host (Web Application) Send request to other shards if needed
  • 13. REST API • JSON query syntax • Developer friendly • Easy to get started
  • 14. Python Example import rawes es = rawes.Elastic('elastic-00:9200') es.get('articles/_search', data={ "query": { "filtered" : { "query" : { "query_string" : { "query" : "Barack Obama" } } } } })
  • 16. Elasticsearch Summary • Scales horizontally • Redundancy • Configures itself automatically • Developer friendly
  • 17. Analytics and elasticsearch • Date Histograms • Statistical facets • Geospatial queries • All with arbitrary search parameters • Again: Fast
  • 18. Use Case: Social Media Analysis • Use social media APIs to search for data on a topic of interest • 100MM documents indexed • Sentiment analysis • Location extraction (“Geotagging”)
  • 19. Sample Document es.post('articles/facebook', data={ ”date": "2012-09-01 08:37:55", "tags": { "sentiment": { "positive": 0.36, "negative": 0.10 } "geotags": [{ "term" : "Cairo", "location" : "30.0566,31.2262”, “type” : “geo_point” }], "search_terms": [ "Mohamed Morsi" ] }, "item": { "publisher: "Facebook" "source_domain": "www.facebook.com", "author": "James Smith", "source_url": "http://www.facebook.com/5551231234/posts/414141414141", "content_text": "Mohamed Morsi visits Iran for first time since 1979 ....", "title": "James Smith posted a note to Facebook", "author_url: "http://www.facebook.com/profile.php?id=5551231234" } })
  • 21. Date Histogram for Sentiment es.get('articles/_search', data= { "query" : { "query_string" : { "query" : "Mohamed Morsi" } }, "facets" : { "sentiment_histogram" : { "date_histogram" : { "key_field" : "date_of_information.$date", "value_field" : "tags.sentiment.positive", "interval" : "day" } } } })
  • 22. Date Histogram for Sentiment
  • 23. Statistical Facet for Sentiment: Query es.get('articles/_search', data={ "query" : { "query_string" : { "query" : "Mohamed Morsi" } }, "facets" : { "sentiment_stats" : { "statistical" : { "field" : "tags.sentiment.positive" } } } })
  • 24. Statistical Facet for Sentiment: Result { "facets": { "sentiment_stats": { "_type": "statistical", "count": 8825, "max": 0.375, "mean": 0.008503991588291782, "min": 0.0, "std_deviation": 0.021251077265305472, "sum_of_squares": 4.623648343200283, "total": 75.04772576667497, "variance": 0.00045160828493598306 } }, "hits": { "hits": [], "max_score": 1.1120162, "total": 8825 }, "took": 60 }
  • 25. Top Keywords es.get('articles/_search', data={ "query" : { "match_all" : {} }, "facets" : { "search_terms" : { "terms" : { "field" : "tags.search_terms", "size" : 3 } } } })
  • 27. Geospatial search es.get('articles/_search', data={ "query" : { "filtered" : { "filter" : { "geo_distance" : { "distance" : ”20km", "tags.geotags.location" : { "lat" : 30, "lon" : 31 } } } } } })