SlideShare a Scribd company logo
1 of 25
Download to read offline
Performance Testing and Scaling Elasticsearch
Jo Draeger 27/11/2018
● Signal’s Use Case & Challenges
● Performance & Scaling Journey
● Live Experiments
Agenda
Signal: signalmedia.co @SignalHQ
Text Analytics Start-Up, founded in 2013
Media Monitoring & more
100 people, about 20 in tech/data science/product
We’re hiring!
Joachim Draeger: linkedin.com/in/joachimdraeger/ @joachimdraeger
Lead Software Engineer, joined two years ago
Terraformed Infrastructure, Tamed Elasticsearch, Built up Monitoring
currently developing full-stack on Signal’s User Management and Login security
Before: 10 years of Java
Signal & Me
Signal’s Use Case
& Challenges
Signal
AI Text-Analytics
Pipeline
Summarisation
Topic Classification
Entity Recognition
Story collation
Deduplication
Transformation
Content
Provider
User
Print
Online
Broadcast
Alerts
API
● PR
○ Monitor own reputation, campaigns and spokespeople
○ Monitor competition
○ Target media
○ Target topics
● Business Potentials & Risks
○ Mergers & Acquisitions
○ Corporate crisis
○ Product Launches
○ Patents
○ Tax & regulation
Use Cases
Private and Confidential
● Latest 15 months of the world’s news
● AI powered annotations
○ Entities (Apple vs apples)
○ Topics
○ Quotes
○ Sentiment
● Full text for keyword searches
● Source
● … and more
Data in Elasticsearch
● Thousands of Users with heterogeneous demands
○ Some only interested in their coverage (1 Entity)
○ Some are interested in a lot of different and specific things
○ => spiky load, sometimes caused by single user
● AI cat & mouse
○ Information needs not (yet) covered by AI annotations get modelled with keywords
○ E.g. “according to”, “said”, “declared” => Quote detection
○ E.g. positive/negative words => Sentiment
○ More and better Entities & Topics
● Queries with lots of terms are expensive!
Challenges & Usage Characteristics
Signal’s Performance
& Scaling Journey
● Be pragmatic
● Add more nodes!
● Monitoring, identify resource bottlenecks *
● Upgrade to latest ES version
● Identify and improve expensive searches *
● Find the right machine type
● Find the right number of indices and shards *
● Build a (mental) model for query cost
Signal’s Performance & Scaling Journey
● End-user latency
● Search queue & rejected searches
● CPU
● Memory
● Garbage collection: Old Gen (new JDKs are coming!)
● IO: Ops & Bytes/s
● Field Data
Monitoring
● Log all queries at source
● Miniature production
○ Proportional less/smaller servers and data
● Consider warming up caches
● Goal A: Experiment with optimisations
○ Replay in real-time
○ Watch impact with monitoring
○ Tune one thing and repeat
● Goal B: Identify expensive searches
○ Replay one search at a time
○ Filter by latency or metrics for single searches - how?
Replay Live Traffic
Live Experiments
● Docker Compose Stack + Python/Shell Scripts
https://github.com/joachimdraeger/elasticsearch-performance-experiments
● The Signal Media One-Million News Articles Dataset
https://research.signalmedia.co/newsir16/signal-dataset.html
One month of articles, September 2015
● Indexed in 3 different ways:
○ Daily indices with 5 shards each, e.g. articles-daily-20150901
○ One index with 5 shards (articles-5)
○ One index with 1 shard (articles-1)
● One search with 4, one search with 16 terms
● Repeat each search 1000x
Live Experiment
What does this mean??
Monitoring for Performance Test?
curl localhost:9200/_nodes/stats?pretty
{
"cluster_name" : "docker-cluster",
"nodes" : {
"napxVuf_QnO8T7Z41HBKTg" : {
"ip" : "192.168.80.2:9300",
...
"indices" : {
"search" : {
"query_total" : 3900440,
"query_time_in_millis" : 1311173
},
"query_cache" : {
"hit_count" : 2394107,
"miss_count" : 212573,
"evictions" : 0
}
},
"process" : {
"cpu" : {
"total_in_millis" : 4726640
},
Metric counters for experiments
1. Get metric counter(s)
2. Execute search (n-times)
3. Get metric counter(s)
4. Calculate difference
=> metrics.py
Repeat searches n-times for more precise
readings.
● Docker Compose Stack
● Signal’s 1M articles data set
● Scripts for indexing
● 2 searches around VW diesel
● Script to run 1000 searches
● metrics.py to collect stats
● On GitHub:
tinyurl.com/esperf-2018
`Live Experiment
Private and Confidential
Results Performance Experiment
Summary
● the default number of shards will change from [5] to [1]
in 7.0.0
● Huge shards are more efficient to search (50GB!)
● One shard per server!?
● Huge shards can be difficult to move/recover
● Multiple shards => parallel indexing/searching
● Replicas for failover and balancing load
● Consider monthly/bi-weekly-quarterly/yearly indices
Last words on shards...
● Metric counters are great to measure experiments
● Shards are expensive
● Terms too!
● Elasticsearch use cases are diverse - it depends!
Summary
https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
https://www.elastic.co/blog/signal-media-optimizing-for-more-elasticsearch-power-with-less-elasticsearch-cl
uster
Further Sources
Any Questions?
tinyurl.com/esperf-2018
@joachimdraeger
Thank you!
We are hiring!
tinyurl.com/signal-engineering-video
linkedin.com/company/signalmedia/
signalmedia.co/solve-big-challenges/
tinyurl.com/esperf-2018
@joachimdraeger

More Related Content

Similar to Elasticsearch Performance Testing and Scaling @ Signal

Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...StormForge .io
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Sandra Garcia
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your EnterpriseWSO2
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Provectus
 
Glasswall Wardley Maps & Services
Glasswall Wardley Maps & ServicesGlasswall Wardley Maps & Services
Glasswall Wardley Maps & ServicesSteve Purkis
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Omid Vahdaty
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Omid Vahdaty
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...LibbySchulze
 
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Jonathan Singer
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Demi Ben-Ari
 

Similar to Elasticsearch Performance Testing and Scaling @ Signal (20)

Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your Enterprise
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Glasswall Wardley Maps & Services
Glasswall Wardley Maps & ServicesGlasswall Wardley Maps & Services
Glasswall Wardley Maps & Services
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 

Recently uploaded

ManageIQ - Sprint 234 Review - Slide Deck
ManageIQ - Sprint 234 Review - Slide DeckManageIQ - Sprint 234 Review - Slide Deck
ManageIQ - Sprint 234 Review - Slide DeckManageIQ
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...Piyovi
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapIshara Amarasekera
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsDEEPRAJ PATHAK
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
logical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxlogical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxRemote DBA Services
 
What are the core components of Azure Data Engineer courses.docx
What are the core components of Azure Data Engineer courses.docxWhat are the core components of Azure Data Engineer courses.docx
What are the core components of Azure Data Engineer courses.docxkzayra69
 
Business Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business AnalysisBusiness Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business AnalysisDEEPRAJ PATHAK
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Tech Tuesday Slides - Getting Started with the Portfolio Module.
Tech Tuesday Slides - Getting Started with the Portfolio Module.Tech Tuesday Slides - Getting Started with the Portfolio Module.
Tech Tuesday Slides - Getting Started with the Portfolio Module.OnePlan Solutions
 
AmsterdamJUG April 2024 - Going serverless with Quarkus GraalVM native images...
AmsterdamJUG April 2024 - Going serverless with Quarkus GraalVM native images...AmsterdamJUG April 2024 - Going serverless with Quarkus GraalVM native images...
AmsterdamJUG April 2024 - Going serverless with Quarkus GraalVM native images...Bert Jan Schrijver
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 

Recently uploaded (20)

ManageIQ - Sprint 234 Review - Slide Deck
ManageIQ - Sprint 234 Review - Slide DeckManageIQ - Sprint 234 Review - Slide Deck
ManageIQ - Sprint 234 Review - Slide Deck
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery Roadmap
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software Projects
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
logical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxlogical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptx
 
What are the core components of Azure Data Engineer courses.docx
What are the core components of Azure Data Engineer courses.docxWhat are the core components of Azure Data Engineer courses.docx
What are the core components of Azure Data Engineer courses.docx
 
Business Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business AnalysisBusiness Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business Analysis
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Tech Tuesday Slides - Getting Started with the Portfolio Module.
Tech Tuesday Slides - Getting Started with the Portfolio Module.Tech Tuesday Slides - Getting Started with the Portfolio Module.
Tech Tuesday Slides - Getting Started with the Portfolio Module.
 
AmsterdamJUG April 2024 - Going serverless with Quarkus GraalVM native images...
AmsterdamJUG April 2024 - Going serverless with Quarkus GraalVM native images...AmsterdamJUG April 2024 - Going serverless with Quarkus GraalVM native images...
AmsterdamJUG April 2024 - Going serverless with Quarkus GraalVM native images...
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 

Elasticsearch Performance Testing and Scaling @ Signal

  • 1. Performance Testing and Scaling Elasticsearch Jo Draeger 27/11/2018
  • 2. ● Signal’s Use Case & Challenges ● Performance & Scaling Journey ● Live Experiments Agenda
  • 3. Signal: signalmedia.co @SignalHQ Text Analytics Start-Up, founded in 2013 Media Monitoring & more 100 people, about 20 in tech/data science/product We’re hiring! Joachim Draeger: linkedin.com/in/joachimdraeger/ @joachimdraeger Lead Software Engineer, joined two years ago Terraformed Infrastructure, Tamed Elasticsearch, Built up Monitoring currently developing full-stack on Signal’s User Management and Login security Before: 10 years of Java Signal & Me
  • 5. Signal AI Text-Analytics Pipeline Summarisation Topic Classification Entity Recognition Story collation Deduplication Transformation Content Provider User Print Online Broadcast Alerts API
  • 6. ● PR ○ Monitor own reputation, campaigns and spokespeople ○ Monitor competition ○ Target media ○ Target topics ● Business Potentials & Risks ○ Mergers & Acquisitions ○ Corporate crisis ○ Product Launches ○ Patents ○ Tax & regulation Use Cases
  • 8. ● Latest 15 months of the world’s news ● AI powered annotations ○ Entities (Apple vs apples) ○ Topics ○ Quotes ○ Sentiment ● Full text for keyword searches ● Source ● … and more Data in Elasticsearch
  • 9. ● Thousands of Users with heterogeneous demands ○ Some only interested in their coverage (1 Entity) ○ Some are interested in a lot of different and specific things ○ => spiky load, sometimes caused by single user ● AI cat & mouse ○ Information needs not (yet) covered by AI annotations get modelled with keywords ○ E.g. “according to”, “said”, “declared” => Quote detection ○ E.g. positive/negative words => Sentiment ○ More and better Entities & Topics ● Queries with lots of terms are expensive! Challenges & Usage Characteristics
  • 11. ● Be pragmatic ● Add more nodes! ● Monitoring, identify resource bottlenecks * ● Upgrade to latest ES version ● Identify and improve expensive searches * ● Find the right machine type ● Find the right number of indices and shards * ● Build a (mental) model for query cost Signal’s Performance & Scaling Journey
  • 12. ● End-user latency ● Search queue & rejected searches ● CPU ● Memory ● Garbage collection: Old Gen (new JDKs are coming!) ● IO: Ops & Bytes/s ● Field Data Monitoring
  • 13. ● Log all queries at source ● Miniature production ○ Proportional less/smaller servers and data ● Consider warming up caches ● Goal A: Experiment with optimisations ○ Replay in real-time ○ Watch impact with monitoring ○ Tune one thing and repeat ● Goal B: Identify expensive searches ○ Replay one search at a time ○ Filter by latency or metrics for single searches - how? Replay Live Traffic
  • 15. ● Docker Compose Stack + Python/Shell Scripts https://github.com/joachimdraeger/elasticsearch-performance-experiments ● The Signal Media One-Million News Articles Dataset https://research.signalmedia.co/newsir16/signal-dataset.html One month of articles, September 2015 ● Indexed in 3 different ways: ○ Daily indices with 5 shards each, e.g. articles-daily-20150901 ○ One index with 5 shards (articles-5) ○ One index with 1 shard (articles-1) ● One search with 4, one search with 16 terms ● Repeat each search 1000x Live Experiment
  • 16. What does this mean?? Monitoring for Performance Test?
  • 17. curl localhost:9200/_nodes/stats?pretty { "cluster_name" : "docker-cluster", "nodes" : { "napxVuf_QnO8T7Z41HBKTg" : { "ip" : "192.168.80.2:9300", ... "indices" : { "search" : { "query_total" : 3900440, "query_time_in_millis" : 1311173 }, "query_cache" : { "hit_count" : 2394107, "miss_count" : 212573, "evictions" : 0 } }, "process" : { "cpu" : { "total_in_millis" : 4726640 }, Metric counters for experiments 1. Get metric counter(s) 2. Execute search (n-times) 3. Get metric counter(s) 4. Calculate difference => metrics.py Repeat searches n-times for more precise readings.
  • 18. ● Docker Compose Stack ● Signal’s 1M articles data set ● Scripts for indexing ● 2 searches around VW diesel ● Script to run 1000 searches ● metrics.py to collect stats ● On GitHub: tinyurl.com/esperf-2018 `Live Experiment
  • 19. Private and Confidential Results Performance Experiment
  • 21. ● the default number of shards will change from [5] to [1] in 7.0.0 ● Huge shards are more efficient to search (50GB!) ● One shard per server!? ● Huge shards can be difficult to move/recover ● Multiple shards => parallel indexing/searching ● Replicas for failover and balancing load ● Consider monthly/bi-weekly-quarterly/yearly indices Last words on shards...
  • 22. ● Metric counters are great to measure experiments ● Shards are expensive ● Terms too! ● Elasticsearch use cases are diverse - it depends! Summary
  • 25. Thank you! We are hiring! tinyurl.com/signal-engineering-video linkedin.com/company/signalmedia/ signalmedia.co/solve-big-challenges/ tinyurl.com/esperf-2018 @joachimdraeger