SlideShare a Scribd company logo
The architecture of search engines
in booking.com
Kang-min Liu |2017-03-09
Amsterdam
關於 Booking.com
Booking.com B.V. 隸屬於 Priceline 集團(納斯達克上市公司:PCLN),擁有並經營
Booking.com™,為全球頂尖線上住宿預訂業者。Booking.com 每日平均預訂晚數超過
1,200,000。Booking.com 網站及應用程式的造訪者來自世界各地,橫跨休閒及商務旅遊市
場。
Booking.com B.V. 公司成立於 1996 年,舉凡小型家庭自營 B&B、商務公寓、五星級豪華
套房,始終以最優惠價格提供各類住宿產品。Booking.com 秉承國際化理念,提供超過 40
種語言版本網頁,合作住宿總數達 1,160,281 間, 遍及全球 226 個國家和地區。
https://www.booking.com/content/about.zh-tw.html
Problems (Tech.)
Data Volume
● Location
○ Cities + POIs: 3M
○ Hotels: 1.2M
● Reservation
○ 1.2M per day
● Hotel Reviews
○ 100M
● Availability
○ 52B
Location
Search
● Input
○ Free Text
● Result
○ Hotel ID
○ City ID
○ Lat/Lon
● Names are short
○ Stopword does not apply
● Multi-language
● High Ambiguity
● Multi-meaning Words
○ Park Hotel
○ Park City
○ City Hotel
● Local names
○ USJ = 環球影城
Difficulties
● MySQL
○ SELECT id FROM City
WHERE name like ‘%London%’
● Pros
○ Easy to implement
● Cons
○ Sensitive to Token order
○ No scoring
○ No partial matching
Solution (pre-2011)
● Elasticsearch
○ English-biased tokenization rule
○ One Index for everything, for all purposes (term suggestion + search)
● Pros
○ Tokenization / Partial matchiing
○ Fast Scoring + TopK
● Cons
○ Scoring is optimized for long corpus. Difficult to tweak.
○ Machine downtime management
Solution (2011-2013)
● Brick
○ In-house search engine. Simply TCP server on top of Lucene.
○ One document per translation.
○ 8 shards / 5 replicas.
○ Term suggestion + auto-correction + classification
● Pros
○ Control the scoring for each token
○ Controls the system deployment
● Cons
○ Tightly made for our specific problem
Solution 2013..NOW
Web
search search search
search search search
search search search
Replica 0 Replica 1 Replica M
…
…
…
… … …
Materialized
Location x
Translation
Location +
Translation
Availability (AV)
Search
● Input
○ Where – city, country, region
○ When – check-in date
○ How long – check-out date
○ What – search options (stars,
price range, etc.)
● Result
○ Available hotels
Inverted index #pre-2011
● LAMP - (P = perl) stack
● normalized, optimized dataset
● search ~ mysql filter + perl sort
● Single search worker per query
● High time complexity
● Large cities are unsearchable Inventory
Search
Pre-computed AV #2011+
● materialized dataset
● read-optimized databases (AV)
○ aim for constant time fetch
● Single search worker
● Failed with inventory growth
● Failed on big search
Search
AVInventory Materialization AVAV
Volume of AV
“The brand’s global dominance cannot be overstated: It works with
approximately 800,000 partners, offering an average of 3 room
types, 2+ rates, 30 different length of stays across 365 arrival days,
which yields something north of 52 billion price points at any given
time.”
https://www.forbes.com/sites/jonathansalembaskin/2015/09/24/booking-com-ch
annels-its-inner-geek-toward-engagement/
Map-Reduce #2014+
● Parallelized search
○ multiple workers per query
● Multiple MR phrases
● Search-as-a-service
○ Plus all the goods and bads of
services
● World search: 20s
● Overheads: IPC, serialization
AVinv Materization AVAV
MR
Web server
MR
MR
MR + LocalAV #2015+
● Data in RAM
○ Bring code to data
● Java
○ reduce constant factor
■ Distance for100K hotels
● perl: 0.4s
● java: 0.04s
○ multi-thread
■ smaller overhead than IPC
inv Materization
Web server
(Scatter-gather
)
SmartAV
MR AV
SmartAV
MR AV
координатор
координатор
Web service
Coordinator
AVsearch AVsearch AVsearch
AVsearch AVsearch AVsearch
AVsearch AVsearch AVsearch
статический шардинг
hotel_id mod N
реплики эквивалентны
shard0
Replica 0 Replica 1 Replica M
…
…
…
shard1
shardN
… … …
Queues for
materializating
availability
Materialization
inv
scatter-gather
рандомный выбор
реплики
retry, если необходимо
ping nodes
апдейты за
последние часы
in-memory indices
AV persisted
● Statically sharded (hotel_id mod k)
● Hotel data
○ Updated Hourly
○ Kept in RAM. Non-persisted, but easy to fetch and rebuild from mysql.
● Availability data
○ Persisted
○ Realtime updates
○ RocskDB
Local AV
● Filter
○ Search criterias: Stars / WiFi / parking etc
○ Group matching: Rooms wanted, persons per room
○ Availability: check-in and check-out dates
● Sort
○ By price, distance, review score
● Top-K
● Merge
Application
● MR search
vs.
MR search + local AV + new tech. Stack
● Adriatic coast (~30K hotels)
○ before - 13s, after - 30ms
● Rome (~6K hotels)
○ before 5s, after 20ms
● Sofia (~0.3K hotels)
○ before 200ms, after - 10ms
Result
Conclusion
One more thing...
We are hiring
人才募集中
workingatbooking.com
eugenia.kondryn@booking.com
anetta.derradivojevic@booking.com
Thank you :)

More Related Content

What's hot

End-End Security with Confluent Platform
End-End Security with Confluent Platform End-End Security with Confluent Platform
End-End Security with Confluent Platform
confluent
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
confluent
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
Sease
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Spark Summit
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
Jordan Halterman
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Kai Wähner
 
MongoDB
MongoDBMongoDB
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch
Sigmoid
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouse
Altinity Ltd
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
Mohammed Fazuluddin
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
Chin Huang
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
Shivji Kumar Jha
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
Rick Branson
 
Power-up services with gRPC
Power-up services with gRPCPower-up services with gRPC
Power-up services with gRPC
The Software House
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
Trey Grainger
 
Well architected ML platforms for Enterprise Data Science
Well architected ML platforms for Enterprise Data ScienceWell architected ML platforms for Enterprise Data Science
Well architected ML platforms for Enterprise Data Science
Leela Krishna Kandrakota
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
Inter-Process Communication in Microservices using gRPC
Inter-Process Communication in Microservices using gRPCInter-Process Communication in Microservices using gRPC
Inter-Process Communication in Microservices using gRPC
Shiju Varghese
 

What's hot (20)

End-End Security with Confluent Platform
End-End Security with Confluent Platform End-End Security with Confluent Platform
End-End Security with Confluent Platform
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
MongoDB
MongoDBMongoDB
MongoDB
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouse
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
Power-up services with gRPC
Power-up services with gRPCPower-up services with gRPC
Power-up services with gRPC
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Well architected ML platforms for Enterprise Data Science
Well architected ML platforms for Enterprise Data ScienceWell architected ML platforms for Enterprise Data Science
Well architected ML platforms for Enterprise Data Science
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 
Inter-Process Communication in Microservices using gRPC
Inter-Process Communication in Microservices using gRPCInter-Process Communication in Microservices using gRPC
Inter-Process Communication in Microservices using gRPC
 

Similar to The architecture of search engines in Booking.com

CoHo.Res Presentation
CoHo.Res PresentationCoHo.Res Presentation
CoHo.Res Presentationmidgedobbs
 
How to use Google to defeat OTAs and drive direct bookings
 How to use Google to defeat OTAs and drive direct bookings How to use Google to defeat OTAs and drive direct bookings
How to use Google to defeat OTAs and drive direct bookings
RezStream
 
Expedia
ExpediaExpedia
Open Nebula
Open NebulaOpen Nebula
Open Nebula
FatimaUbaleGwamnati
 
20141211 Booking.com Introduction
20141211 Booking.com Introduction20141211 Booking.com Introduction
20141211 Booking.com IntroductionYaskania Mejia
 
Evg Presentation 2010 General Np
Evg Presentation 2010   General NpEvg Presentation 2010   General Np
Evg Presentation 2010 General Np
eingrand
 
Full product overview
Full product overview Full product overview
Full product overview
Erwin ten Kate
 
Booking.com: Best in Class Case Study
Booking.com: Best in Class Case StudyBooking.com: Best in Class Case Study
Booking.com: Best in Class Case Study
Carmelon Digital Marketing
 
HotelsPro Company Presentation
HotelsPro Company PresentationHotelsPro Company Presentation
HotelsPro Company Presentation
HotelsPro
 
Nightsbridge
NightsbridgeNightsbridge
Nightsbridge
Damian Cook
 
Avvio @ Hotel & Tourism Investment Forum 2018
Avvio @ Hotel & Tourism Investment Forum 2018Avvio @ Hotel & Tourism Investment Forum 2018
Avvio @ Hotel & Tourism Investment Forum 2018
George Dutchev
 
Trivago - NOAH12 San Francisco
Trivago - NOAH12 San FranciscoTrivago - NOAH12 San Francisco
Trivago - NOAH12 San Francisco
NOAH Advisors
 
Zdenek Komenda - Kiwi.com and its virtually interlined story
Zdenek Komenda - Kiwi.com and its virtually interlined storyZdenek Komenda - Kiwi.com and its virtually interlined story
Zdenek Komenda - Kiwi.com and its virtually interlined story
Travel Tech Conference Russia
 
Dealroom travel-research-june-2016-1
Dealroom travel-research-june-2016-1Dealroom travel-research-june-2016-1
Dealroom travel-research-june-2016-1
SRI HARSHA JETTI
 
Get Ready for Dreamforce 2015
Get Ready for Dreamforce 2015Get Ready for Dreamforce 2015
Get Ready for Dreamforce 2015
Salesforce Partners
 
Airbnb pitch brief
Airbnb pitch briefAirbnb pitch brief
Airbnb pitch brief
Cubeyou Inc
 
Our full portfolio
Our full portfolioOur full portfolio
Our full portfolio
Erwin ten Kate
 
SEM - A Hotel Perspective
SEM - A Hotel PerspectiveSEM - A Hotel Perspective
SEM - A Hotel Perspective
Performics.Convonix
 
TPAS2020 How to sell more rooms with Booking.com: Search Extension
TPAS2020 How to sell more rooms with Booking.com: Search ExtensionTPAS2020 How to sell more rooms with Booking.com: Search Extension
TPAS2020 How to sell more rooms with Booking.com: Search Extension
Travelpayouts
 
Hotel Search API.pptx
Hotel Search API.pptxHotel Search API.pptx
Hotel Search API.pptx
Anushasingh61
 

Similar to The architecture of search engines in Booking.com (20)

CoHo.Res Presentation
CoHo.Res PresentationCoHo.Res Presentation
CoHo.Res Presentation
 
How to use Google to defeat OTAs and drive direct bookings
 How to use Google to defeat OTAs and drive direct bookings How to use Google to defeat OTAs and drive direct bookings
How to use Google to defeat OTAs and drive direct bookings
 
Expedia
ExpediaExpedia
Expedia
 
Open Nebula
Open NebulaOpen Nebula
Open Nebula
 
20141211 Booking.com Introduction
20141211 Booking.com Introduction20141211 Booking.com Introduction
20141211 Booking.com Introduction
 
Evg Presentation 2010 General Np
Evg Presentation 2010   General NpEvg Presentation 2010   General Np
Evg Presentation 2010 General Np
 
Full product overview
Full product overview Full product overview
Full product overview
 
Booking.com: Best in Class Case Study
Booking.com: Best in Class Case StudyBooking.com: Best in Class Case Study
Booking.com: Best in Class Case Study
 
HotelsPro Company Presentation
HotelsPro Company PresentationHotelsPro Company Presentation
HotelsPro Company Presentation
 
Nightsbridge
NightsbridgeNightsbridge
Nightsbridge
 
Avvio @ Hotel & Tourism Investment Forum 2018
Avvio @ Hotel & Tourism Investment Forum 2018Avvio @ Hotel & Tourism Investment Forum 2018
Avvio @ Hotel & Tourism Investment Forum 2018
 
Trivago - NOAH12 San Francisco
Trivago - NOAH12 San FranciscoTrivago - NOAH12 San Francisco
Trivago - NOAH12 San Francisco
 
Zdenek Komenda - Kiwi.com and its virtually interlined story
Zdenek Komenda - Kiwi.com and its virtually interlined storyZdenek Komenda - Kiwi.com and its virtually interlined story
Zdenek Komenda - Kiwi.com and its virtually interlined story
 
Dealroom travel-research-june-2016-1
Dealroom travel-research-june-2016-1Dealroom travel-research-june-2016-1
Dealroom travel-research-june-2016-1
 
Get Ready for Dreamforce 2015
Get Ready for Dreamforce 2015Get Ready for Dreamforce 2015
Get Ready for Dreamforce 2015
 
Airbnb pitch brief
Airbnb pitch briefAirbnb pitch brief
Airbnb pitch brief
 
Our full portfolio
Our full portfolioOur full portfolio
Our full portfolio
 
SEM - A Hotel Perspective
SEM - A Hotel PerspectiveSEM - A Hotel Perspective
SEM - A Hotel Perspective
 
TPAS2020 How to sell more rooms with Booking.com: Search Extension
TPAS2020 How to sell more rooms with Booking.com: Search ExtensionTPAS2020 How to sell more rooms with Booking.com: Search Extension
TPAS2020 How to sell more rooms with Booking.com: Search Extension
 
Hotel Search API.pptx
Hotel Search API.pptxHotel Search API.pptx
Hotel Search API.pptx
 

More from Kang-min Liu

o̍h Tai-gi
o̍h Tai-gio̍h Tai-gi
o̍h Tai-gi
Kang-min Liu
 
Elasticsearch 實戰介紹
Elasticsearch 實戰介紹Elasticsearch 實戰介紹
Elasticsearch 實戰介紹Kang-min Liu
 
Same but Different
Same but DifferentSame but Different
Same but DifferentKang-min Liu
 
perlbrew yapcasia 2010
perlbrew yapcasia 2010perlbrew yapcasia 2010
perlbrew yapcasia 2010
Kang-min Liu
 
Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)Kang-min Liu
 
Learning From Ruby (Yapc Asia)
Learning From Ruby (Yapc Asia)Learning From Ruby (Yapc Asia)
Learning From Ruby (Yapc Asia)
Kang-min Liu
 
YAPC::Tiny Introduction
YAPC::Tiny IntroductionYAPC::Tiny Introduction
YAPC::Tiny Introduction
Kang-min Liu
 
Integration Test With Cucumber And Webrat
Integration Test With Cucumber And WebratIntegration Test With Cucumber And Webrat
Integration Test With Cucumber And Webrat
Kang-min Liu
 
Good Evils In Perl
Good Evils In PerlGood Evils In Perl
Good Evils In PerlKang-min Liu
 
Javascript Tutorial
Javascript TutorialJavascript Tutorial
Javascript Tutorial
Kang-min Liu
 
Javascript Basic
Javascript BasicJavascript Basic
Javascript Basic
Kang-min Liu
 
Handlino - RandomLife
Handlino - RandomLifeHandlino - RandomLife
Handlino - RandomLife
Kang-min Liu
 
Jformino
JforminoJformino
Jformino
Kang-min Liu
 
網頁程式還可以怎麼設計
網頁程式還可以怎麼設計網頁程式還可以怎麼設計
網頁程式還可以怎麼設計
Kang-min Liu
 
OSDC.tw 2008 Lightening Talk
OSDC.tw 2008 Lightening TalkOSDC.tw 2008 Lightening Talk
OSDC.tw 2008 Lightening Talk
Kang-min Liu
 
Happy Designer 20080329
Happy Designer 20080329Happy Designer 20080329
Happy Designer 20080329
Kang-min Liu
 

More from Kang-min Liu (19)

o̍h Tai-gi
o̍h Tai-gio̍h Tai-gi
o̍h Tai-gi
 
Elasticsearch 實戰介紹
Elasticsearch 實戰介紹Elasticsearch 實戰介紹
Elasticsearch 實戰介紹
 
Perlbrew
PerlbrewPerlbrew
Perlbrew
 
Same but Different
Same but DifferentSame but Different
Same but Different
 
perlbrew yapcasia 2010
perlbrew yapcasia 2010perlbrew yapcasia 2010
perlbrew yapcasia 2010
 
Git
GitGit
Git
 
Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)
 
Learning From Ruby (Yapc Asia)
Learning From Ruby (Yapc Asia)Learning From Ruby (Yapc Asia)
Learning From Ruby (Yapc Asia)
 
YAPC::Tiny Introduction
YAPC::Tiny IntroductionYAPC::Tiny Introduction
YAPC::Tiny Introduction
 
Integration Test With Cucumber And Webrat
Integration Test With Cucumber And WebratIntegration Test With Cucumber And Webrat
Integration Test With Cucumber And Webrat
 
Good Evils In Perl
Good Evils In PerlGood Evils In Perl
Good Evils In Perl
 
Javascript Tutorial
Javascript TutorialJavascript Tutorial
Javascript Tutorial
 
Javascript Basic
Javascript BasicJavascript Basic
Javascript Basic
 
Handlino - RandomLife
Handlino - RandomLifeHandlino - RandomLife
Handlino - RandomLife
 
Jformino
JforminoJformino
Jformino
 
Test Continuous
Test ContinuousTest Continuous
Test Continuous
 
網頁程式還可以怎麼設計
網頁程式還可以怎麼設計網頁程式還可以怎麼設計
網頁程式還可以怎麼設計
 
OSDC.tw 2008 Lightening Talk
OSDC.tw 2008 Lightening TalkOSDC.tw 2008 Lightening Talk
OSDC.tw 2008 Lightening Talk
 
Happy Designer 20080329
Happy Designer 20080329Happy Designer 20080329
Happy Designer 20080329
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 

The architecture of search engines in Booking.com

  • 1. The architecture of search engines in booking.com Kang-min Liu |2017-03-09
  • 3. 關於 Booking.com Booking.com B.V. 隸屬於 Priceline 集團(納斯達克上市公司:PCLN),擁有並經營 Booking.com™,為全球頂尖線上住宿預訂業者。Booking.com 每日平均預訂晚數超過 1,200,000。Booking.com 網站及應用程式的造訪者來自世界各地,橫跨休閒及商務旅遊市 場。 Booking.com B.V. 公司成立於 1996 年,舉凡小型家庭自營 B&B、商務公寓、五星級豪華 套房,始終以最優惠價格提供各類住宿產品。Booking.com 秉承國際化理念,提供超過 40 種語言版本網頁,合作住宿總數達 1,160,281 間, 遍及全球 226 個國家和地區。 https://www.booking.com/content/about.zh-tw.html
  • 5. Data Volume ● Location ○ Cities + POIs: 3M ○ Hotels: 1.2M ● Reservation ○ 1.2M per day ● Hotel Reviews ○ 100M ● Availability ○ 52B
  • 7. Search ● Input ○ Free Text ● Result ○ Hotel ID ○ City ID ○ Lat/Lon
  • 8. ● Names are short ○ Stopword does not apply ● Multi-language ● High Ambiguity ● Multi-meaning Words ○ Park Hotel ○ Park City ○ City Hotel ● Local names ○ USJ = 環球影城 Difficulties
  • 9. ● MySQL ○ SELECT id FROM City WHERE name like ‘%London%’ ● Pros ○ Easy to implement ● Cons ○ Sensitive to Token order ○ No scoring ○ No partial matching Solution (pre-2011)
  • 10. ● Elasticsearch ○ English-biased tokenization rule ○ One Index for everything, for all purposes (term suggestion + search) ● Pros ○ Tokenization / Partial matchiing ○ Fast Scoring + TopK ● Cons ○ Scoring is optimized for long corpus. Difficult to tweak. ○ Machine downtime management Solution (2011-2013)
  • 11. ● Brick ○ In-house search engine. Simply TCP server on top of Lucene. ○ One document per translation. ○ 8 shards / 5 replicas. ○ Term suggestion + auto-correction + classification ● Pros ○ Control the scoring for each token ○ Controls the system deployment ● Cons ○ Tightly made for our specific problem Solution 2013..NOW
  • 12. Web search search search search search search search search search Replica 0 Replica 1 Replica M … … … … … … Materialized Location x Translation Location + Translation
  • 14. Search ● Input ○ Where – city, country, region ○ When – check-in date ○ How long – check-out date ○ What – search options (stars, price range, etc.) ● Result ○ Available hotels
  • 15. Inverted index #pre-2011 ● LAMP - (P = perl) stack ● normalized, optimized dataset ● search ~ mysql filter + perl sort ● Single search worker per query ● High time complexity ● Large cities are unsearchable Inventory Search
  • 16. Pre-computed AV #2011+ ● materialized dataset ● read-optimized databases (AV) ○ aim for constant time fetch ● Single search worker ● Failed with inventory growth ● Failed on big search Search AVInventory Materialization AVAV
  • 17. Volume of AV “The brand’s global dominance cannot be overstated: It works with approximately 800,000 partners, offering an average of 3 room types, 2+ rates, 30 different length of stays across 365 arrival days, which yields something north of 52 billion price points at any given time.” https://www.forbes.com/sites/jonathansalembaskin/2015/09/24/booking-com-ch annels-its-inner-geek-toward-engagement/
  • 18. Map-Reduce #2014+ ● Parallelized search ○ multiple workers per query ● Multiple MR phrases ● Search-as-a-service ○ Plus all the goods and bads of services ● World search: 20s ● Overheads: IPC, serialization AVinv Materization AVAV MR Web server MR MR
  • 19. MR + LocalAV #2015+ ● Data in RAM ○ Bring code to data ● Java ○ reduce constant factor ■ Distance for100K hotels ● perl: 0.4s ● java: 0.04s ○ multi-thread ■ smaller overhead than IPC inv Materization Web server (Scatter-gather ) SmartAV MR AV SmartAV MR AV
  • 20. координатор координатор Web service Coordinator AVsearch AVsearch AVsearch AVsearch AVsearch AVsearch AVsearch AVsearch AVsearch статический шардинг hotel_id mod N реплики эквивалентны shard0 Replica 0 Replica 1 Replica M … … … shard1 shardN … … … Queues for materializating availability Materialization inv scatter-gather рандомный выбор реплики retry, если необходимо ping nodes апдейты за последние часы in-memory indices AV persisted
  • 21. ● Statically sharded (hotel_id mod k) ● Hotel data ○ Updated Hourly ○ Kept in RAM. Non-persisted, but easy to fetch and rebuild from mysql. ● Availability data ○ Persisted ○ Realtime updates ○ RocskDB Local AV
  • 22. ● Filter ○ Search criterias: Stars / WiFi / parking etc ○ Group matching: Rooms wanted, persons per room ○ Availability: check-in and check-out dates ● Sort ○ By price, distance, review score ● Top-K ● Merge Application
  • 23. ● MR search vs. MR search + local AV + new tech. Stack ● Adriatic coast (~30K hotels) ○ before - 13s, after - 30ms ● Rome (~6K hotels) ○ before 5s, after 20ms ● Sofia (~0.3K hotels) ○ before 200ms, after - 10ms Result
  • 25.
  • 26. One more thing... We are hiring 人才募集中 workingatbooking.com eugenia.kondryn@booking.com anetta.derradivojevic@booking.com