Monitoring tools for
  ElasticSearch
     SF Meetup
     2013.03.06

                  Sushant Shankar
                  Shyam Kuttikkad
• Why and how we use ElasticSearch
• Monitoring
  – Tools
  – Index Building
  – Query Performance
Who is asdfas
• Social Sharing and Content Discovery platform
   – We help >600,000 publishers with content distribution, user
     engagement, and advertising monetization
   – 450 Fortune 1000 brand marketers leverage our unique social signals
     to deliver impactful advertising
• We develop Machine Learning algorithms operating on Big
  Data to:
   – Provide content sharing insights to Publishers
   – Build customized audience segments for advertising campaigns
   – Extract actionable insights out of social and interest data




www.33Across.com
www.tynt.com
Data firehose of 30B monthly
   events, 1.25B cookies
                     - Interaction with web
                     content
                     - Shares – images,
                     copies
                     - Searches

                           Build, understand,
                           analyze
                           Real-time view
                                    ElasticSearch!
                      Social Audiences
                      Behavior
                      Context
                      Knowledge
Production ElasticSearch cluster

Hardware
6 nodes, 24GB RAM
16GB for ES service
4 cores
3x 1.5TB drive

Index                  Build index
>1TB/index             using MR job
(replicated)           and Bulk API
~300M documents
~5KB / document
~3 hours
System monitoring using Zabbix

               Index Build
ElasticSearch specific monitoring
                     using SPM




Scalable Performance Monitoring (http://sematext.com/spm/index.html)
•   Index stats – Total/Refreshed/Merged documents
•   Shards – Total/Active/Relocating/Initializing
•   Search - Request rate and latency
•   Cache – {Filter, field} cache {count, evictions, size}
•   Machine – CPU, Memory, JVM, GC, Network, Disk
Index Building Optimization using
             Zabbix and SPM
Amount bulk indexed




                      Time taken
                       CPU util.
                       Mem util.
                        Disk I/O
                       Network



                                   # Shards
in practice…
Debugging and Validating using SPM
Index Building: Learnings
• 2 shards / CPU
• 10,000 documents (users) per indexing
  request

• Bulk API for our use case
• No replicas
• Refresh off (index.refresh_interval = -1)
Query Performance: Learnings
•   1-2 Replicas (and for reliability)
•   Turn refresh on again (5s default)
•   Warm up effect (Index Warm up API 0.20+)
•   Optimize API
•   Simulate multiple users
QUERIES?
Sushant Shankar
sushant.shankar@33across.com


     Shyam Kuttikkad
shyam.kuttikkad@33across.com
Why we really need a search engine
         Batch! Good for complicated tasks
         (Machine Learning, Graph Algorithms, etc.)




                          …                           …
Warm Up: load into memory and cache
Other cool features
• Custom Scoring functions
• Scripts – MVEL, Python
• Facets

•   Exploring:
•   Real-time indexing
•   Indexing images, files, etc.
•   Parent-child relationships

SF ElasticSearch Meetup 2013.04.06 - Monitoring

  • 1.
    Monitoring tools for ElasticSearch SF Meetup 2013.03.06 Sushant Shankar Shyam Kuttikkad
  • 2.
    • Why andhow we use ElasticSearch • Monitoring – Tools – Index Building – Query Performance
  • 3.
    Who is asdfas •Social Sharing and Content Discovery platform – We help >600,000 publishers with content distribution, user engagement, and advertising monetization – 450 Fortune 1000 brand marketers leverage our unique social signals to deliver impactful advertising • We develop Machine Learning algorithms operating on Big Data to: – Provide content sharing insights to Publishers – Build customized audience segments for advertising campaigns – Extract actionable insights out of social and interest data www.33Across.com www.tynt.com
  • 4.
    Data firehose of30B monthly events, 1.25B cookies - Interaction with web content - Shares – images, copies - Searches Build, understand, analyze Real-time view ElasticSearch! Social Audiences Behavior Context Knowledge
  • 5.
    Production ElasticSearch cluster Hardware 6nodes, 24GB RAM 16GB for ES service 4 cores 3x 1.5TB drive Index Build index >1TB/index using MR job (replicated) and Bulk API ~300M documents ~5KB / document ~3 hours
  • 6.
    System monitoring usingZabbix Index Build
  • 7.
    ElasticSearch specific monitoring using SPM Scalable Performance Monitoring (http://sematext.com/spm/index.html) • Index stats – Total/Refreshed/Merged documents • Shards – Total/Active/Relocating/Initializing • Search - Request rate and latency • Cache – {Filter, field} cache {count, evictions, size} • Machine – CPU, Memory, JVM, GC, Network, Disk
  • 8.
    Index Building Optimizationusing Zabbix and SPM Amount bulk indexed Time taken CPU util. Mem util. Disk I/O Network # Shards
  • 9.
  • 10.
  • 11.
    Index Building: Learnings •2 shards / CPU • 10,000 documents (users) per indexing request • Bulk API for our use case • No replicas • Refresh off (index.refresh_interval = -1)
  • 12.
    Query Performance: Learnings • 1-2 Replicas (and for reliability) • Turn refresh on again (5s default) • Warm up effect (Index Warm up API 0.20+) • Optimize API • Simulate multiple users
  • 13.
  • 14.
    Sushant Shankar sushant.shankar@33across.com Shyam Kuttikkad shyam.kuttikkad@33across.com
  • 15.
    Why we reallyneed a search engine Batch! Good for complicated tasks (Machine Learning, Graph Algorithms, etc.) … …
  • 16.
    Warm Up: loadinto memory and cache
  • 17.
    Other cool features •Custom Scoring functions • Scripts – MVEL, Python • Facets • Exploring: • Real-time indexing • Indexing images, files, etc. • Parent-child relationships

Editor's Notes

  • #7 http://www.zabbix.com/ - ‘’Enterprise class monitoring solution for everyone’
  • #11 http://www.zabbix.com/ - ‘’Enterprise class monitoring solution for everyone’
  • #16 Collect information over 1B users internationally – text copied from over 600K publisher sites, images, searches, pages visitedDifferent slices of data – now!