SlideShare a Scribd company logo
1 of 35
Download to read offline
Lessons Learned
While Scaling
Elasticsearch at
$ whoami
{
"name": "Dainius Jocas",
"company": {
"name": "Vinted",
"mission": "Make second-hand the first choice worldwide"
},
"role": "Staff Engineer",
"website": "https://www.jocas.lt",
"twitter": "@dainius_jocas",
"github": "dainiusjocas",
"author_of_oss": ["lucene-grep"]
}
2
Agenda
1. Intro
2. Scale as of January, 2020
3. IDs as keywords
4. Dates
5. function_score
6. Scale as of January, 2021
7. Discussion
3
Intro: Vinted and Elasticsearch
- Vinted is a second-hand clothes marketplace
- Operates in 10+ countries, 10+ languages
- Elasticsearch is in use since 2014
- Elasticsearch 1.4.1
- Today I’ll share lessons learned while scaling Elasticsearch at Vinted
4
Elasticsearch Scale @Vinted as of January 2020 (1)
- Elasticsearch 5.8.6 (~end-of-life at that time)
- 1 cluster of ~420 data nodes (each 14 CPU with 64GB RAM, bare metal)
- ~300K requests per minute (RPM) during peak hours
- ~160M documents
- 84 shards with 4 replicas
- p99 latency during peak hours was ~250 ms
- Slow Log (i.e latency >100ms) queries skyrockets during peak hours
Elasticsearch Scale @Vinted as of January 2020 (2)
- The company see Elasticsearch as a risk(!)
- Back-end developers didn’t want to touch it
- The usage of the Vinted platform was expected to at least double by October
(similar increase in usage of Elasticsearch and more servers is a bad idea)
- Functionality on top of Elasticsearch just accumulated over the years, no
oversight, no clear ownership
- SRE were on the Elasticsearch duty (hint: server restart doesn’t help all that
much when the Elasticsearch cluster is overloaded)
Replayed 10k queries during the “easy Tuesday”
Adventures
IDs as keywords (1)
{
"query": {
"bool": {
"filter": [
{
"terms": {
"country_id": [2,4]
}
}
]
}
}
}
IDs as keywords (2): context
- Elasticsearch indexed data from MySQL (check vinted.engineering blog on
details for that)
- Common practice for Ruby on Rails apps is to create database tables with
primary keys as auto-increment integers
- Q: Which Elasticsearch data type to use?
- A: integer, because why not?
IDs as keywords (3)
IDs as keywords (4): TL;DR of the blog post
- Before: integers were indexed as padded string terms
- After: integers indexed as block k-d tree (BKD)
- Change in Lucene get into Elasticsearch since 5.0
- Numeric data types were optimized for range queries
- Numeric data types still support terms queries
IDs as keywords (5): from Vinted point of view
- On IDs we don’t do range queries
- We use IDs for simple terms filtering
- The “optimized” integer data type for our use case degraded performance
- How much?
- For our workload it was a ~15% instant decrease in p99 latency, ~20ms
- We use around 10 such fields for filtering in every search query
- The required change was as simple as changing the index mappings and
reindexing the data
IDs as keywords (6): summary
- Remember that Vinted uses Elasticsearch since pre 5.0
- At that time it was OK to index IDs as Elasticsearch integers
- Post 5.0 IDs as integers became a performance issue
- Such a change that brakes nothing can easily slip under the regular
developers radar and then could backfire badly
- Regular developers think that Elasticsearch performs badly
Dates
- Date math
- Date filters
Date Math (1)
{
"query": {
"bool": {
"filter": [
{
"range": {
"created_at": {
"gte": "now-7d"
}
}
}
]
}
}
}
Date Math (2)
- Most queries that use now (see Date Math) cannot be cached
- From developer POV, it is simple, you just hardcode `now-7d`
- If most of your queries are using Date Math then CPU usage increases
- With more traffic the cluster starts to have “hot nodes”, queueing
- My advice: always use timestamps in production
- Cached queries -> massive gains in the cluster throughput
Date filters (1)
{
"query": {
"bool": {
"filter": [
{
"range": {
"created_at": {
"lte": "2021-05-27T09:55:00Z"
}
}
}
]
}
}
}
Date filters (2)
- This query clause asks Elasticsearch to “collect docs that are not newer than
X”
- What if the X is meant to be now and your documents are accumulated over
last 10 year?
- Then this filter matches ~99% of all docs in your indices
- Not a good “filter”
Date filters (3)
{
"query": {
"bool": {
"must_not": [
{
"range": {
"created_at": {
"gt": "2021-05-27T09:55:00Z"
}
}
}
]
}
}
}
Date filters (4)
- This query clause asks Elasticsearch to “collect docs that are not newer than
X”
- What if the X is now and your documents are accumulated over 10 year?
- Then this filter matches ~1% of all docs in your indices
- A good “filter”, i.e. more specific filter is a good filter
- From docs: “if you must filter by timestamp, use a coarse granularity (e.g.
round timestamp to 5 minutes) so the query value changes infrequently”
- For our setup it reduced the p99 by ~15%, ~10ms
Dates: summary
- Don’t use Date Math in production
- Write filters on timestamp in a way that it matches fewer documents
Function score (1)
{
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": {
"params": {
"now": 1579605387087
},
"source": "now > doc['user_updated_at'].value ? 1 /
ln(now - doc['user_updated_at'].value + 1) :
0",
"lang": "expression"
}
}
}
]
}
}
}
Function score (2)
- Models boosting on novelty
- Multiplicates _score
- Why a custom script score? Why not some decay function?
- Decay functions performed worse in our benchmarks
Function score (3): problems
- This clause was the biggest contributor to the query latency
- The script is executed on every hit
- Problematic for queries that have many hits, i.e. > 10k
- Sharding helped keep the latency down (remember we had 84 shards for
160M documents)
- But at some point sharding turns into oversharding and no longer helps
- The task was to come up with a way to replace the function_score with a
more performant solution while preserving the ranking logic
Function score (4): distance_feature
{
"query": {
"bool": {
"should": [
{
"distance_feature": {
"field": "user_updated_at",
"origin": "2021-05-27T10:59:00Z",
"pivot": "7d"
}
}
]
}
}
}
Function score (5): distance_feature
- A similar boost on novelty
- Multiplicative boost turned into additive boost (big change in logic)
- Unlike the function_score query or other ways to change relevance scores,
the distance_feature query efficiently skips non-competitive hits
(remember those 10 year old documents)
- NOTE: the origin value is not set to `now` because of the Date Math!
Function score (5): summary
- Function score provides a flexible scoring options
- However, when applying on big indices you must be carefull
- Function score is applied on all hits
- No way for Elasticsearch to skip hits
- Caching is the single most important thing for a good Elasticsearch performance and the
function score query doesn’t play well with it
- Distance feature query clause might be an option to replace the function
score if you have issues with performance
Elasticsearch Scale @Vinted as of January 2021 (1)
- ES 7.9.3
- 3 clusters each ~160 data nodes (each 16 CPU with 64GB RAM, bare metal)
- A offline cluster of similar size for testing (upgrades, cluster setup, etc.)
- ~1000K RPM during peak hours
- ~360M documents
- p99 latency during peaks ~150 ms
- Timeouts (>500ms) are 0.0367% of all queries
Elasticsearch Scale @Vinted as of January 2021 (2)
- Team (8 people strong) is responsible for Elasticsearch
- Regular capacity testing for 2x load in terms of
- document count
- query throughput
- Elasticsearch is seen as the system that can handle the growth
- Functionality is tested performance wise before releasing to production
- The team members rotate on duty
- Keeping clusters operational
- Maintenance tasks from backlog
- Is everything perfect? No.
- Elasticsearch is resource hungry
- Version upgrades still has to be checked before releasing
- Offline testing cluster helps with that
- Machine Learning engineers insist that Elasticsearch is not up for their tasks
- Despite the fact that the search ranking data is used for their model training
- Search re-ranking is done outside of Elasticsearch (e.g. operational complexity)
- Elasticsearch default installation offers very few tools for search relevance
work
Discussion (1)
Discussion (2)
Thank You!

More Related Content

What's hot

HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBaseHBaseCon
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container EraSadayuki Furuhashi
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...Spark Summit
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learningfelixcss
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_casewyukawa
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkDatabricks
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small StartHiroshi Toyama
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3Databricks
 
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Citus Data
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Databricks
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesRomi Kuntsman
 
Natural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkDatabricks
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinTill Rohrmann
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_enOgibayashi
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaYaroslav Tkachenko
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrDatabricks
 
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Databricks
 

What's hot (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframes
 
Natural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache Spark
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkr
 
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
 

Similar to Lessons Learned While Scaling Elasticsearch at Vinted

Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overviewAmit Juneja
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Spark Summit
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From UberChester Chen
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastoreTomas Sirny
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersSematext Group, Inc.
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchSylvain Wallez
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Spark Summit
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
 
ElasticSearch.pptx
ElasticSearch.pptxElasticSearch.pptx
ElasticSearch.pptxTrnHiu748002
 
Viadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on MesosViadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on MesosCepoi Eugen
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseNikolay Samokhvalov
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...Lviv Startup Club
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersBen van Mol
 

Similar to Lessons Learned While Scaling Elasticsearch at Vinted (20)

Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From Uber
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
ElasticSearch.pptx
ElasticSearch.pptxElasticSearch.pptx
ElasticSearch.pptx
 
Viadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on MesosViadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on Mesos
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 

Recently uploaded

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 

Recently uploaded (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 

Lessons Learned While Scaling Elasticsearch at Vinted

  • 2. $ whoami { "name": "Dainius Jocas", "company": { "name": "Vinted", "mission": "Make second-hand the first choice worldwide" }, "role": "Staff Engineer", "website": "https://www.jocas.lt", "twitter": "@dainius_jocas", "github": "dainiusjocas", "author_of_oss": ["lucene-grep"] } 2
  • 3. Agenda 1. Intro 2. Scale as of January, 2020 3. IDs as keywords 4. Dates 5. function_score 6. Scale as of January, 2021 7. Discussion 3
  • 4. Intro: Vinted and Elasticsearch - Vinted is a second-hand clothes marketplace - Operates in 10+ countries, 10+ languages - Elasticsearch is in use since 2014 - Elasticsearch 1.4.1 - Today I’ll share lessons learned while scaling Elasticsearch at Vinted 4
  • 5. Elasticsearch Scale @Vinted as of January 2020 (1) - Elasticsearch 5.8.6 (~end-of-life at that time) - 1 cluster of ~420 data nodes (each 14 CPU with 64GB RAM, bare metal) - ~300K requests per minute (RPM) during peak hours - ~160M documents - 84 shards with 4 replicas - p99 latency during peak hours was ~250 ms - Slow Log (i.e latency >100ms) queries skyrockets during peak hours
  • 6. Elasticsearch Scale @Vinted as of January 2020 (2) - The company see Elasticsearch as a risk(!) - Back-end developers didn’t want to touch it - The usage of the Vinted platform was expected to at least double by October (similar increase in usage of Elasticsearch and more servers is a bad idea) - Functionality on top of Elasticsearch just accumulated over the years, no oversight, no clear ownership - SRE were on the Elasticsearch duty (hint: server restart doesn’t help all that much when the Elasticsearch cluster is overloaded)
  • 7. Replayed 10k queries during the “easy Tuesday”
  • 9. IDs as keywords (1) { "query": { "bool": { "filter": [ { "terms": { "country_id": [2,4] } } ] } } }
  • 10. IDs as keywords (2): context - Elasticsearch indexed data from MySQL (check vinted.engineering blog on details for that) - Common practice for Ruby on Rails apps is to create database tables with primary keys as auto-increment integers - Q: Which Elasticsearch data type to use? - A: integer, because why not?
  • 12. IDs as keywords (4): TL;DR of the blog post - Before: integers were indexed as padded string terms - After: integers indexed as block k-d tree (BKD) - Change in Lucene get into Elasticsearch since 5.0 - Numeric data types were optimized for range queries - Numeric data types still support terms queries
  • 13. IDs as keywords (5): from Vinted point of view - On IDs we don’t do range queries - We use IDs for simple terms filtering - The “optimized” integer data type for our use case degraded performance - How much? - For our workload it was a ~15% instant decrease in p99 latency, ~20ms - We use around 10 such fields for filtering in every search query - The required change was as simple as changing the index mappings and reindexing the data
  • 14. IDs as keywords (6): summary - Remember that Vinted uses Elasticsearch since pre 5.0 - At that time it was OK to index IDs as Elasticsearch integers - Post 5.0 IDs as integers became a performance issue - Such a change that brakes nothing can easily slip under the regular developers radar and then could backfire badly - Regular developers think that Elasticsearch performs badly
  • 15. Dates - Date math - Date filters
  • 16. Date Math (1) { "query": { "bool": { "filter": [ { "range": { "created_at": { "gte": "now-7d" } } } ] } } }
  • 17. Date Math (2) - Most queries that use now (see Date Math) cannot be cached - From developer POV, it is simple, you just hardcode `now-7d` - If most of your queries are using Date Math then CPU usage increases - With more traffic the cluster starts to have “hot nodes”, queueing - My advice: always use timestamps in production - Cached queries -> massive gains in the cluster throughput
  • 18. Date filters (1) { "query": { "bool": { "filter": [ { "range": { "created_at": { "lte": "2021-05-27T09:55:00Z" } } } ] } } }
  • 19. Date filters (2) - This query clause asks Elasticsearch to “collect docs that are not newer than X” - What if the X is meant to be now and your documents are accumulated over last 10 year? - Then this filter matches ~99% of all docs in your indices - Not a good “filter”
  • 20. Date filters (3) { "query": { "bool": { "must_not": [ { "range": { "created_at": { "gt": "2021-05-27T09:55:00Z" } } } ] } } }
  • 21. Date filters (4) - This query clause asks Elasticsearch to “collect docs that are not newer than X” - What if the X is now and your documents are accumulated over 10 year? - Then this filter matches ~1% of all docs in your indices - A good “filter”, i.e. more specific filter is a good filter - From docs: “if you must filter by timestamp, use a coarse granularity (e.g. round timestamp to 5 minutes) so the query value changes infrequently” - For our setup it reduced the p99 by ~15%, ~10ms
  • 22.
  • 23.
  • 24. Dates: summary - Don’t use Date Math in production - Write filters on timestamp in a way that it matches fewer documents
  • 25. Function score (1) { "query": { "function_score": { "functions": [ { "script_score": { "script": { "params": { "now": 1579605387087 }, "source": "now > doc['user_updated_at'].value ? 1 / ln(now - doc['user_updated_at'].value + 1) : 0", "lang": "expression" } } } ] } } }
  • 26. Function score (2) - Models boosting on novelty - Multiplicates _score - Why a custom script score? Why not some decay function? - Decay functions performed worse in our benchmarks
  • 27. Function score (3): problems - This clause was the biggest contributor to the query latency - The script is executed on every hit - Problematic for queries that have many hits, i.e. > 10k - Sharding helped keep the latency down (remember we had 84 shards for 160M documents) - But at some point sharding turns into oversharding and no longer helps - The task was to come up with a way to replace the function_score with a more performant solution while preserving the ranking logic
  • 28. Function score (4): distance_feature { "query": { "bool": { "should": [ { "distance_feature": { "field": "user_updated_at", "origin": "2021-05-27T10:59:00Z", "pivot": "7d" } } ] } } }
  • 29. Function score (5): distance_feature - A similar boost on novelty - Multiplicative boost turned into additive boost (big change in logic) - Unlike the function_score query or other ways to change relevance scores, the distance_feature query efficiently skips non-competitive hits (remember those 10 year old documents) - NOTE: the origin value is not set to `now` because of the Date Math!
  • 30. Function score (5): summary - Function score provides a flexible scoring options - However, when applying on big indices you must be carefull - Function score is applied on all hits - No way for Elasticsearch to skip hits - Caching is the single most important thing for a good Elasticsearch performance and the function score query doesn’t play well with it - Distance feature query clause might be an option to replace the function score if you have issues with performance
  • 31. Elasticsearch Scale @Vinted as of January 2021 (1) - ES 7.9.3 - 3 clusters each ~160 data nodes (each 16 CPU with 64GB RAM, bare metal) - A offline cluster of similar size for testing (upgrades, cluster setup, etc.) - ~1000K RPM during peak hours - ~360M documents - p99 latency during peaks ~150 ms - Timeouts (>500ms) are 0.0367% of all queries
  • 32. Elasticsearch Scale @Vinted as of January 2021 (2) - Team (8 people strong) is responsible for Elasticsearch - Regular capacity testing for 2x load in terms of - document count - query throughput - Elasticsearch is seen as the system that can handle the growth - Functionality is tested performance wise before releasing to production - The team members rotate on duty - Keeping clusters operational - Maintenance tasks from backlog
  • 33. - Is everything perfect? No. - Elasticsearch is resource hungry - Version upgrades still has to be checked before releasing - Offline testing cluster helps with that - Machine Learning engineers insist that Elasticsearch is not up for their tasks - Despite the fact that the search ranking data is used for their model training - Search re-ranking is done outside of Elasticsearch (e.g. operational complexity) - Elasticsearch default installation offers very few tools for search relevance work Discussion (1)