Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building Search@Airbnb
Mousom Dhar Gupta
Total Guests
20,000,000+
Countries
190
Cities
34,000+
Castles
600+
Listings Worldwide
1,200,000+
Search
That Awesome Slide Title of Yours
Technical Stack
____________________________
DropWizard as a service framework (incl. Jetty, Jersey, Jackson)
ZooKeeper (v...
Web App
Search1
150 Search Threads
Lucene Index
~30 replicas of same index
dataJVM
…Search2 SearchN
Search
Overview
search
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Combiner
Filtering 
and
Ranking
Shards
____________________...
Challenges
____________________________
Bootstrap (creating the index from scratch)
Ensuring consistency of the index with...
What’s in the Lucene index?
____________________________
Positions of listings indexed using Lucene’s spatial module
(Recu...
fraud
SpinalTap
…
calendar
master
DataStore
Medusa
Search 1
Search N
Search 2
Realtime Update
Tails binary update logs from Mysql Servers (5.6+)
Converts changes in any of the tables into actionable objects called
“M...
fraud
SpinalTap
…
calenda
r
master
DataStore
Medusa
Search 1
Search N
Search 2
Realtime Update
Source of truth for search index data.
Listens to updates from Spinaltap and builds new IndexData by
querying ~15 mysql ta...
fraud
SpinalTap
…
calenda
r
master
DataStore
Medusa
Search 1
Search N
Search 2
Realtime Update
What’s in the forward index?
____________________________
Holds all the metadata about a listing required by
scoring and f...
Availability
____________________________
!
Depends on the profile of guest.
The checkin date must be one of the valid sta...
Pricing
____________________________
!
Depends on number of guests , number of nights.
How close or further away the check...
Instant Book
____________________________
!
Depends on number of guests , number of nights.
Profile of the guest like posi...
Needs to store objects with 50-100 fields as values keyed by listing id.
Should avoid the cost of serialization/deserializ...
// Forward Index	
public interface ForwardIndex<V> {	
!
Map<Long, V> asMap();	
	
void put(long id, V value);	
!
void putAl...
NonBlocking In-Memory
HashMap
DiskStore
// Forward Index	
public class ForwardIndexStore<V> implements ForwardIndex<V> {	
...
Ranking Problem
____________________________
Not a text search problem
Users are almost never searching for a specific ite...
Ranking Components
____________________________
Relevance
Quality
Bookability
Personalization
Desirability of location
etc...
Several hundred signals used to build
machine learning models:
!
Properties of the listing (reviews, location, etc.)
Behav...
Life of a Query
Query Understanding
Retrieval Populator
First Pass Scorer
Geocoding
Configuring retrieval options
Choosing...
Second Pass Ranking
____________________________
Traditional ranking works like this:
!
then sort by 
In contrast, second ...
Life of a Query
Query Understanding
Retrieval Populator
First Pass Scorer
Geocoding
Configuring retrieval options
Choosing...
Search@airbnb
Upcoming SlideShare
Loading in …5
×

Search@airbnb

2,361 views

Published on

Architecture of Airbnb Search backend.

Published in: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Download Link: https://www.mediafire.com/?c14sjfzct471trb Ever wanted to hack a PSN Code? Well now you can, because this tutorial will show you a simple yet effective method to Hack PSN Code simply through this Hack !
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Search@airbnb

  1. 1. Building Search@Airbnb Mousom Dhar Gupta
  2. 2. Total Guests 20,000,000+ Countries 190 Cities 34,000+ Castles 600+ Listings Worldwide 1,200,000+
  3. 3. Search
  4. 4. That Awesome Slide Title of Yours
  5. 5. Technical Stack ____________________________ DropWizard as a service framework (incl. Jetty, Jersey, Jackson) ZooKeeper (via Smartstack) for service discovery. Lucene for index storage and simple retrieval. In-house built forward index, real-time indexing, ranking, advanced filtering.
  6. 6. Web App Search1 150 Search Threads Lucene Index ~30 replicas of same index dataJVM …Search2 SearchN Search Overview
  7. 7. search Lucene Lucene Lucene Lucene Lucene Lucene Lucene Lucene Combiner Filtering and Ranking Shards ____________________________ Each box has 8 shards of Lucene Index Latency is 50% less than a single shard index
  8. 8. Challenges ____________________________ Bootstrap (creating the index from scratch) Ensuring consistency of the index with ground truth data in real time Indexing
  9. 9. What’s in the Lucene index? ____________________________ Positions of listings indexed using Lucene’s spatial module (RecursivePrefixTreeStrategy) Categorical and numerical properties like room type and maximum occupancy Full text (descriptions, reviews, etc.) ~40 fields per listing from a variety of data sources, all updated in real time
  10. 10. fraud SpinalTap … calendar master DataStore Medusa Search 1 Search N Search 2 Realtime Update
  11. 11. Tails binary update logs from Mysql Servers (5.6+) Converts changes in any of the tables into actionable objects called “Mutations” (Inserts, deletes, Updates) Broadcasts them to Medusa using Kafka Spinaltap
  12. 12. fraud SpinalTap … calenda r master DataStore Medusa Search 1 Search N Search 2 Realtime Update
  13. 13. Source of truth for search index data. Listens to updates from Spinaltap and builds new IndexData by querying ~15 mysql tables from three different databases. Persists everything in a DataStore and broadcasts latest version to all search nodes. Uses ZooKeeper for leader election. Medusa
  14. 14. fraud SpinalTap … calenda r master DataStore Medusa Search 1 Search N Search 2 Realtime Update
  15. 15. What’s in the forward index? ____________________________ Holds all the metadata about a listing required by scoring and filtering. We also have complicated business rules to calculate Price, Availability, InstantBook etc which needs a ton of metadata. ~50 fields built from multiple data source and updated in realtime. public final class ForwardIndexData { private final CalendarData calendarData; private final PricingData pricingData; private final HostInfo hostInfo; . . . . . . . . } ! public final class CalendarData { private final DateRanges reservationDates; private final SeasonalValues startDayOfWeeks; . . . . } ! private final class SeasonalValues<T> { private final DateRange startDate; private final T value; . . . . } Forward Index
  16. 16. Availability ____________________________ ! Depends on the profile of guest. The checkin date must be one of the valid start days of the week. Must satisfy seasonal minimum nights. There must be enough preparation time for the host. Import busy dates from external calendars to avoid booking conflict.
  17. 17. Pricing ____________________________ ! Depends on number of guests , number of nights. How close or further away the checkin date is. How long is the trip, does the host have Weekly and Monthly pricing. Is there special price override for these nights.
  18. 18. Instant Book ____________________________ ! Depends on number of guests , number of nights. Profile of the guest like positive reviews, does have profile photo? How much preparation time the host has etc.
  19. 19. Needs to store objects with 50-100 fields as values keyed by listing id. Should avoid the cost of serialization/deserialization during every fetch. Data must be available in-memory for fast lookup, but also persisted on disk. Highly Concurrent, writer shouldn’t block the readers (One writer but >100 reader threads) Requirements Why did we need our custom Forward Index?
  20. 20. // Forward Index public interface ForwardIndex<V> { ! Map<Long, V> asMap(); void put(long id, V value); ! void putAll(Map<Long, V> values); ! void remove(long id); ! void commit(); ! } Forward Index Interface // Writer forwardIndex.put(listingId, listingData); . . . // write to disk and also make it visible to readers. forwardIndex.commit(); // Reader // Fetch forward index data from in-memory map Map<Long, ListingData> fwdIndex = forwardIndex.asMap(); ListingData data = fwdIndex.get(listingId); ! // Use it to evaluate business rules checkAvailability(data, searchRequest); calculatePrice(data, searchRequest)
  21. 21. NonBlocking In-Memory HashMap DiskStore // Forward Index public class ForwardIndexStore<V> implements ForwardIndex<V> { private final DB<V> diskStore; private final Cache<V> cache; ! . . . . ! @Override Map<Long, V> asMap() { return Collections.unmodifiableMap(cache); } void put(long id, V value) { diskStore.put(id, value); cache.put(id, value); } ! . . . . ! void commit() { diskStore.commit(); cache.commit(); } } Forward Index Implementation
  22. 22. Ranking Problem ____________________________ Not a text search problem Users are almost never searching for a specific item, rather they’re looking to “Discover” The most common component of a query is location Highly personalized – the user is a part of the query Optimizing for conversion (Search -> Inquiry -> Booking) Evolution through continuous experimentation Ranking
  23. 23. Ranking Components ____________________________ Relevance Quality Bookability Personalization Desirability of location etc. Ranking
  24. 24. Several hundred signals used to build machine learning models: ! Properties of the listing (reviews, location, etc.) Behavioral signals (mined from request logs) Image quality and click ability (computer vision) Host behavior (response time/rate, cancellations, etc.) Host preferences model DB snapshots Logs
  25. 25. Life of a Query Query Understanding Retrieval Populator First Pass Scorer Geocoding Configuring retrieval options Choosing ranking models Quality Bookability Relevance Second Pass Ranking Result Generation AirEvents Filtering by Price and Availability 25 results 2000 results 25 results
  26. 26. Second Pass Ranking ____________________________ Traditional ranking works like this: ! then sort by In contrast, second pass operates on the entire list at once: ! Makes it possible to implement features like result diversity, etc.
  27. 27. Life of a Query Query Understanding Retrieval Populator First Pass Scorer Geocoding Configuring retrieval options Choosing ranking models Quality Bookability Relevance Second Pass Ranking Result Generation AirEvents Filtering by Price and Availability 25 results 2000 results 25 results

×