SlideShare a Scribd company logo
Lessons Learned
While Scaling
Elasticsearch at
$ whoami
{
"name": "Dainius Jocas",
"company": {
"name": "Vinted",
"mission": "Make second-hand the first choice worldwide"
},
"role": "Staff Engineer",
"website": "https://www.jocas.lt",
"twitter": "@dainius_jocas",
"github": "dainiusjocas",
"author_of_oss": ["lucene-grep"]
}
2
Agenda
1. Intro
2. Scale as of January, 2020
3. IDs as keywords
4. Dates
5. function_score
6. Scale as of January, 2021
7. Discussion
3
Intro: Vinted and Elasticsearch
- Vinted is a second-hand clothes marketplace
- Operates in 10+ countries, 10+ languages
- Elasticsearch is in use since 2014
- Elasticsearch 1.4.1
- Today I’ll share lessons learned while scaling Elasticsearch at Vinted
4
Elasticsearch Scale @Vinted as of January 2020 (1)
- Elasticsearch 5.8.6 (~end-of-life at that time)
- 1 cluster of ~420 data nodes (each 14 CPU with 64GB RAM, bare metal)
- ~300K requests per minute (RPM) during peak hours
- ~160M documents
- 84 shards with 4 replicas
- p99 latency during peak hours was ~250 ms
- Slow Log (i.e latency >100ms) queries skyrockets during peak hours
Elasticsearch Scale @Vinted as of January 2020 (2)
- The company see Elasticsearch as a risk(!)
- Back-end developers didn’t want to touch it
- The usage of the Vinted platform was expected to at least double by October
(similar increase in usage of Elasticsearch and more servers is a bad idea)
- Functionality on top of Elasticsearch just accumulated over the years, no
oversight, no clear ownership
- SRE were on the Elasticsearch duty (hint: server restart doesn’t help all that
much when the Elasticsearch cluster is overloaded)
Replayed 10k queries during the “easy Tuesday”
Adventures
IDs as keywords (1)
{
"query": {
"bool": {
"filter": [
{
"terms": {
"country_id": [2,4]
}
}
]
}
}
}
IDs as keywords (2): context
- Elasticsearch indexed data from MySQL (check vinted.engineering blog on
details for that)
- Common practice for Ruby on Rails apps is to create database tables with
primary keys as auto-increment integers
- Q: Which Elasticsearch data type to use?
- A: integer, because why not?
IDs as keywords (3)
IDs as keywords (4): TL;DR of the blog post
- Before: integers were indexed as padded string terms
- After: integers indexed as block k-d tree (BKD)
- Change in Lucene get into Elasticsearch since 5.0
- Numeric data types were optimized for range queries
- Numeric data types still support terms queries
IDs as keywords (5): from Vinted point of view
- On IDs we don’t do range queries
- We use IDs for simple terms filtering
- The “optimized” integer data type for our use case degraded performance
- How much?
- For our workload it was a ~15% instant decrease in p99 latency, ~20ms
- We use around 10 such fields for filtering in every search query
- The required change was as simple as changing the index mappings and
reindexing the data
IDs as keywords (6): summary
- Remember that Vinted uses Elasticsearch since pre 5.0
- At that time it was OK to index IDs as Elasticsearch integers
- Post 5.0 IDs as integers became a performance issue
- Such a change that brakes nothing can easily slip under the regular
developers radar and then could backfire badly
- Regular developers think that Elasticsearch performs badly
Dates
- Date math
- Date filters
Date Math (1)
{
"query": {
"bool": {
"filter": [
{
"range": {
"created_at": {
"gte": "now-7d"
}
}
}
]
}
}
}
Date Math (2)
- Most queries that use now (see Date Math) cannot be cached
- From developer POV, it is simple, you just hardcode `now-7d`
- If most of your queries are using Date Math then CPU usage increases
- With more traffic the cluster starts to have “hot nodes”, queueing
- My advice: always use timestamps in production
- Cached queries -> massive gains in the cluster throughput
Date filters (1)
{
"query": {
"bool": {
"filter": [
{
"range": {
"created_at": {
"lte": "2021-05-27T09:55:00Z"
}
}
}
]
}
}
}
Date filters (2)
- This query clause asks Elasticsearch to “collect docs that are not newer than
X”
- What if the X is meant to be now and your documents are accumulated over
last 10 year?
- Then this filter matches ~99% of all docs in your indices
- Not a good “filter”
Date filters (3)
{
"query": {
"bool": {
"must_not": [
{
"range": {
"created_at": {
"gt": "2021-05-27T09:55:00Z"
}
}
}
]
}
}
}
Date filters (4)
- This query clause asks Elasticsearch to “collect docs that are not newer than
X”
- What if the X is now and your documents are accumulated over 10 year?
- Then this filter matches ~1% of all docs in your indices
- A good “filter”, i.e. more specific filter is a good filter
- From docs: “if you must filter by timestamp, use a coarse granularity (e.g.
round timestamp to 5 minutes) so the query value changes infrequently”
- For our setup it reduced the p99 by ~15%, ~10ms
Dates: summary
- Don’t use Date Math in production
- Write filters on timestamp in a way that it matches fewer documents
Function score (1)
{
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": {
"params": {
"now": 1579605387087
},
"source": "now > doc['user_updated_at'].value ? 1 /
ln(now - doc['user_updated_at'].value + 1) :
0",
"lang": "expression"
}
}
}
]
}
}
}
Function score (2)
- Models boosting on novelty
- Multiplicates _score
- Why a custom script score? Why not some decay function?
- Decay functions performed worse in our benchmarks
Function score (3): problems
- This clause was the biggest contributor to the query latency
- The script is executed on every hit
- Problematic for queries that have many hits, i.e. > 10k
- Sharding helped keep the latency down (remember we had 84 shards for
160M documents)
- But at some point sharding turns into oversharding and no longer helps
- The task was to come up with a way to replace the function_score with a
more performant solution while preserving the ranking logic
Function score (4): distance_feature
{
"query": {
"bool": {
"should": [
{
"distance_feature": {
"field": "user_updated_at",
"origin": "2021-05-27T10:59:00Z",
"pivot": "7d"
}
}
]
}
}
}
Function score (5): distance_feature
- A similar boost on novelty
- Multiplicative boost turned into additive boost (big change in logic)
- Unlike the function_score query or other ways to change relevance scores,
the distance_feature query efficiently skips non-competitive hits
(remember those 10 year old documents)
- NOTE: the origin value is not set to `now` because of the Date Math!
Function score (5): summary
- Function score provides a flexible scoring options
- However, when applying on big indices you must be carefull
- Function score is applied on all hits
- No way for Elasticsearch to skip hits
- Caching is the single most important thing for a good Elasticsearch performance and the
function score query doesn’t play well with it
- Distance feature query clause might be an option to replace the function
score if you have issues with performance
Elasticsearch Scale @Vinted as of January 2021 (1)
- ES 7.9.3
- 3 clusters each ~160 data nodes (each 16 CPU with 64GB RAM, bare metal)
- A offline cluster of similar size for testing (upgrades, cluster setup, etc.)
- ~1000K RPM during peak hours
- ~360M documents
- p99 latency during peaks ~150 ms
- Timeouts (>500ms) are 0.0367% of all queries
Elasticsearch Scale @Vinted as of January 2021 (2)
- Team (8 people strong) is responsible for Elasticsearch
- Regular capacity testing for 2x load in terms of
- document count
- query throughput
- Elasticsearch is seen as the system that can handle the growth
- Functionality is tested performance wise before releasing to production
- The team members rotate on duty
- Keeping clusters operational
- Maintenance tasks from backlog
- Is everything perfect? No.
- Elasticsearch is resource hungry
- Version upgrades still has to be checked before releasing
- Offline testing cluster helps with that
- Machine Learning engineers insist that Elasticsearch is not up for their tasks
- Despite the fact that the search ranking data is used for their model training
- Search re-ranking is done outside of Elasticsearch (e.g. operational complexity)
- Elasticsearch default installation offers very few tools for search relevance
work
Discussion (1)
Discussion (2)
Thank You!

More Related Content

What's hot

Apache kafka
Apache kafkaApache kafka
Apache kafka
Daan Gerits
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
Sadayuki Furuhashi
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
Spark Summit
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
felixcss
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
wyukawa
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
Hiroshi Toyama
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
Databricks
 
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Citus Data
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
Databricks
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframes
Romi Kuntsman
 
Natural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache Spark
Databricks
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Till Rohrmann
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
Ogibayashi
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
Yaroslav Tkachenko
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
Sadayuki Furuhashi
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkr
Databricks
 
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Databricks
 

What's hot (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframes
 
Natural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache Spark
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkr
 
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
 

Similar to Lessons Learned While Scaling Elasticsearch at Vinted

Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
Amit Juneja
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From Uber
Chester Chen
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
Tomas Sirny
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
Andrew Lamb
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
Sematext Group, Inc.
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
Sylvain Wallez
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
Minsoo Jun
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Jie Li
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Databricks
 
ElasticSearch.pptx
ElasticSearch.pptxElasticSearch.pptx
ElasticSearch.pptx
TrnHiu748002
 
Viadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on MesosViadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on Mesos
Cepoi Eugen
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
DataStax Academy
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
Nikolay Samokhvalov
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
MongoDB APAC
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Lviv Startup Club
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
Ben van Mol
 

Similar to Lessons Learned While Scaling Elasticsearch at Vinted (20)

Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From Uber
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
ElasticSearch.pptx
ElasticSearch.pptxElasticSearch.pptx
ElasticSearch.pptx
 
Viadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on MesosViadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on Mesos
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 

Recently uploaded

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 

Recently uploaded (20)

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 

Lessons Learned While Scaling Elasticsearch at Vinted

  • 2. $ whoami { "name": "Dainius Jocas", "company": { "name": "Vinted", "mission": "Make second-hand the first choice worldwide" }, "role": "Staff Engineer", "website": "https://www.jocas.lt", "twitter": "@dainius_jocas", "github": "dainiusjocas", "author_of_oss": ["lucene-grep"] } 2
  • 3. Agenda 1. Intro 2. Scale as of January, 2020 3. IDs as keywords 4. Dates 5. function_score 6. Scale as of January, 2021 7. Discussion 3
  • 4. Intro: Vinted and Elasticsearch - Vinted is a second-hand clothes marketplace - Operates in 10+ countries, 10+ languages - Elasticsearch is in use since 2014 - Elasticsearch 1.4.1 - Today I’ll share lessons learned while scaling Elasticsearch at Vinted 4
  • 5. Elasticsearch Scale @Vinted as of January 2020 (1) - Elasticsearch 5.8.6 (~end-of-life at that time) - 1 cluster of ~420 data nodes (each 14 CPU with 64GB RAM, bare metal) - ~300K requests per minute (RPM) during peak hours - ~160M documents - 84 shards with 4 replicas - p99 latency during peak hours was ~250 ms - Slow Log (i.e latency >100ms) queries skyrockets during peak hours
  • 6. Elasticsearch Scale @Vinted as of January 2020 (2) - The company see Elasticsearch as a risk(!) - Back-end developers didn’t want to touch it - The usage of the Vinted platform was expected to at least double by October (similar increase in usage of Elasticsearch and more servers is a bad idea) - Functionality on top of Elasticsearch just accumulated over the years, no oversight, no clear ownership - SRE were on the Elasticsearch duty (hint: server restart doesn’t help all that much when the Elasticsearch cluster is overloaded)
  • 7. Replayed 10k queries during the “easy Tuesday”
  • 9. IDs as keywords (1) { "query": { "bool": { "filter": [ { "terms": { "country_id": [2,4] } } ] } } }
  • 10. IDs as keywords (2): context - Elasticsearch indexed data from MySQL (check vinted.engineering blog on details for that) - Common practice for Ruby on Rails apps is to create database tables with primary keys as auto-increment integers - Q: Which Elasticsearch data type to use? - A: integer, because why not?
  • 12. IDs as keywords (4): TL;DR of the blog post - Before: integers were indexed as padded string terms - After: integers indexed as block k-d tree (BKD) - Change in Lucene get into Elasticsearch since 5.0 - Numeric data types were optimized for range queries - Numeric data types still support terms queries
  • 13. IDs as keywords (5): from Vinted point of view - On IDs we don’t do range queries - We use IDs for simple terms filtering - The “optimized” integer data type for our use case degraded performance - How much? - For our workload it was a ~15% instant decrease in p99 latency, ~20ms - We use around 10 such fields for filtering in every search query - The required change was as simple as changing the index mappings and reindexing the data
  • 14. IDs as keywords (6): summary - Remember that Vinted uses Elasticsearch since pre 5.0 - At that time it was OK to index IDs as Elasticsearch integers - Post 5.0 IDs as integers became a performance issue - Such a change that brakes nothing can easily slip under the regular developers radar and then could backfire badly - Regular developers think that Elasticsearch performs badly
  • 15. Dates - Date math - Date filters
  • 16. Date Math (1) { "query": { "bool": { "filter": [ { "range": { "created_at": { "gte": "now-7d" } } } ] } } }
  • 17. Date Math (2) - Most queries that use now (see Date Math) cannot be cached - From developer POV, it is simple, you just hardcode `now-7d` - If most of your queries are using Date Math then CPU usage increases - With more traffic the cluster starts to have “hot nodes”, queueing - My advice: always use timestamps in production - Cached queries -> massive gains in the cluster throughput
  • 18. Date filters (1) { "query": { "bool": { "filter": [ { "range": { "created_at": { "lte": "2021-05-27T09:55:00Z" } } } ] } } }
  • 19. Date filters (2) - This query clause asks Elasticsearch to “collect docs that are not newer than X” - What if the X is meant to be now and your documents are accumulated over last 10 year? - Then this filter matches ~99% of all docs in your indices - Not a good “filter”
  • 20. Date filters (3) { "query": { "bool": { "must_not": [ { "range": { "created_at": { "gt": "2021-05-27T09:55:00Z" } } } ] } } }
  • 21. Date filters (4) - This query clause asks Elasticsearch to “collect docs that are not newer than X” - What if the X is now and your documents are accumulated over 10 year? - Then this filter matches ~1% of all docs in your indices - A good “filter”, i.e. more specific filter is a good filter - From docs: “if you must filter by timestamp, use a coarse granularity (e.g. round timestamp to 5 minutes) so the query value changes infrequently” - For our setup it reduced the p99 by ~15%, ~10ms
  • 22.
  • 23.
  • 24. Dates: summary - Don’t use Date Math in production - Write filters on timestamp in a way that it matches fewer documents
  • 25. Function score (1) { "query": { "function_score": { "functions": [ { "script_score": { "script": { "params": { "now": 1579605387087 }, "source": "now > doc['user_updated_at'].value ? 1 / ln(now - doc['user_updated_at'].value + 1) : 0", "lang": "expression" } } } ] } } }
  • 26. Function score (2) - Models boosting on novelty - Multiplicates _score - Why a custom script score? Why not some decay function? - Decay functions performed worse in our benchmarks
  • 27. Function score (3): problems - This clause was the biggest contributor to the query latency - The script is executed on every hit - Problematic for queries that have many hits, i.e. > 10k - Sharding helped keep the latency down (remember we had 84 shards for 160M documents) - But at some point sharding turns into oversharding and no longer helps - The task was to come up with a way to replace the function_score with a more performant solution while preserving the ranking logic
  • 28. Function score (4): distance_feature { "query": { "bool": { "should": [ { "distance_feature": { "field": "user_updated_at", "origin": "2021-05-27T10:59:00Z", "pivot": "7d" } } ] } } }
  • 29. Function score (5): distance_feature - A similar boost on novelty - Multiplicative boost turned into additive boost (big change in logic) - Unlike the function_score query or other ways to change relevance scores, the distance_feature query efficiently skips non-competitive hits (remember those 10 year old documents) - NOTE: the origin value is not set to `now` because of the Date Math!
  • 30. Function score (5): summary - Function score provides a flexible scoring options - However, when applying on big indices you must be carefull - Function score is applied on all hits - No way for Elasticsearch to skip hits - Caching is the single most important thing for a good Elasticsearch performance and the function score query doesn’t play well with it - Distance feature query clause might be an option to replace the function score if you have issues with performance
  • 31. Elasticsearch Scale @Vinted as of January 2021 (1) - ES 7.9.3 - 3 clusters each ~160 data nodes (each 16 CPU with 64GB RAM, bare metal) - A offline cluster of similar size for testing (upgrades, cluster setup, etc.) - ~1000K RPM during peak hours - ~360M documents - p99 latency during peaks ~150 ms - Timeouts (>500ms) are 0.0367% of all queries
  • 32. Elasticsearch Scale @Vinted as of January 2021 (2) - Team (8 people strong) is responsible for Elasticsearch - Regular capacity testing for 2x load in terms of - document count - query throughput - Elasticsearch is seen as the system that can handle the growth - Functionality is tested performance wise before releasing to production - The team members rotate on duty - Keeping clusters operational - Maintenance tasks from backlog
  • 33. - Is everything perfect? No. - Elasticsearch is resource hungry - Version upgrades still has to be checked before releasing - Offline testing cluster helps with that - Machine Learning engineers insist that Elasticsearch is not up for their tasks - Despite the fact that the search ranking data is used for their model training - Search re-ranking is done outside of Elasticsearch (e.g. operational complexity) - Elasticsearch default installation offers very few tools for search relevance work Discussion (1)