SlideShare a Scribd company logo
SEARCH AND RELEVANCE AT SCALE FOR ONLINE CLASSIFIEDS
STAY CONNECTED
Twitter @activate_conf
Facebook @activateconf
#Activate19
Log in to wifi, follow Activate on social media,
and download the event app where you can
submit an evaluation after the session
WIFI NETWORK: Activate2019
PASSWORD: Lucidworks
DOWNLOAD THE ACTIVATE 2019 MOBILE APP
Search Activate2019 in the App/Play store
Or visit: http://crowd.cc/activate19
ROGER RAFANELL
Senior Big Data Engineer | letgo
About me
?
letgo
• Second-hand marketplace app
• Founded 2015
• Main markets: US & Turkey
• 5M downloads/month & 20M MAU
Agenda
• Introduction to search in classifieds
• search in the past
• Building a new search platform at scale
• Enabling data science
• The future of search platform
Introduction
Introduction
Search in classifieds
100K items/day
Reposted
We cannot cache results!!!
Sold / Deleted
Introduction
Hyperlocality search
letgo catalog at (39.89,-77.08) letgo catalog at (39.28, -76.69)
Results in Washington, D.C. ≠ Results in Baltimore
Introduction
User input data example
• Typos
• Slang
• Poor pictures
• Wrong information
• Ambiguity
• Weirdness
Introduction
Explorers vs Deal Seekers (Marco Polo vs Hernán Cortés)
• Like browsing
• Recall > Precision
• “Cars”
• Search, filter, haggle
• Precision > Recall
• “2015 honda civic lx”
search in the past
Search in the past
Early 2015
Listings
API
Search
API
Search in the past
Late 2017
Shard 1
...
8 replicas
/ shard
x 3
Shard 1 Shard 5
...
8 replicas
/ shard
x 2
x 1 (↑nodes)
(↑nodes)
(↑nodes)
Listings
API
Search
API
Shard 5
24hFULL INDEXATION
8hINSTANCES RECOVERY
150msRESPONSE TIME
Operation limitations
• Slow full catalog imports
• Slow reactivity to traffic spikes
• High costs
Business limitations
• No enrichment at import time
• Not easy to evolve schemas
• Not agile!
NOA/B TESTING
NODATA SCIENCE
Search API limitations
• One API request -> One search query
• PHP + Solarium (↓ concurrency)
• High costs
Search
API
200rpsTHROUGHPUT
400msRESPONSE TIME
60+SERVICE INSTANCES
The platform was not scaling
Building a new
search platform
• Spot oldest queries sent by search API
• ↑Traffic for fresh listings
• All fields were stored
Building a new search platform
Analysis
3 monthsCATALOG RETENTION
15 minHIGHLY REQUESTED LISTINGS
• Keep only the last 3 month listings
• Index only the queried fields
• Store only listings IDs
Building a new search platform
Looking for a strategy
>100GBOLD CATALOG SIZE
<4GBNEW CATALOG SIZE
Solr was used as a key-value storage
NOT as a full-text search engine
Building a new search platform
THE BAD
• Where to store all listings fields?
• Need a catalog storage (database)
• Need also a fast serving layer
• Near real-time indexing constraints
THE GOOD
• No more sharding (↓index size)
• Standalone Solr instances
• High bump in performance
Drawing a plan
Building a new search platform
Big Data to the rescue
• NRT pipeline to keep the listings catalog up-to-date
• Batch pipeline to fully rebuild the catalog
Building a new search platform
The new architecture
Self-healing
Building a new search platform
The Search indexer ETL
Fetch
Listings
Enrich
Listings
Fetch
Verticals
Features
Normalize
Attributes
Anonymize
PII
Store
to
DB
Store
to
Fast Layer
Building a new search platform
Search engine performance
Throughput Recovery time Latency
↑12x ↓8x12’
Building a new search platform
Catalog performance
Catalog
(Fast layer)
Catalog
(Database)
Worst Case
Latency
16ms 56ms40ms
Gluing all the pieces
Building a new search platform
Search API redesign
x 1
x 1
x 1
Listings
API
Search
Library
Search
API
IDs
Building a new search platform
Search library - Scala to rule them all
• Wrap the search retrieval logic
• One request → Multiple parallel queries to Solr
• Non-blocking I/O with solrS, persistence drivers
• Seamless integration with Finagle framework
Building a new search platform
Search API - Scala to rule them all
• Based on Finagle services framework
• Finatra/Finagle = ↑concurrency & ↓resources
• Enable backend driven A/B testing
• Personalized search
Building a new search platform
Overall performance
↑Throughput &↓ Latency Resources Cost Reduction
13x 100x↓20x
Enabling
data science
SEARCH & RELEVANCE
Enabling data science
Unlocked data science projects
• Recall
– Query expansion
• Precision
– Learning to Rank
Enabling data science
Improving recall - Query expansion
Searching for: ‘mountain bike’
blue mountain bicycle → Synonyms
mountain and road bike → OK
mountain bike frame → Relevant?
bicicleta de montaña → Language
scout montain bike → Spelling
mountain bike lock → Relevant?
Similar Queries Cause
blue mountain bicycle
mountain and road bike
mountain bike frame
bicicleta de montaña
scout montain bike
mountain bike lock
Expected Behavior
Enabling data science
Improving precision - Learning to Rank
‘mountain bike’ Items retrieved
Enabling data science
‘mountain bike’
2 months ago
30 miles
Improving precision - Learning to Rank
Enabling data science
Improving precision - Learning to Rank
Enabling data science
bike
1 2
3 4
5 6
Improving precision - Learning to Rank
y = 0
y = 0
y = 1
Enabling data science
Conversions on query ‘bike’
Improving precision - Learning to Rank
Enabling data science
Before After
SEARCH CONVERSIONS
Improving precision - Learning to Rank
Text score = indicator of relevance.
Freshness and distance are key!
The future of
search platform
Future of search platform
Work in progress
• Migration to Solr 8 (↓latency & better security)
• Iterate Learning to Rank
• Real-time personalization
• Visual categorization (Reveal)
Conclusions
Conclusions
Raising the bar
• Indexer pipeline enables data enrichment & transformations
• Simplified search architecture with lightweight in-memory indices
• Fault-tolerant and self-healing infrastructure and processes
• Unlock real data science in
Tech Stack
Airflow Redshift
THANK YOU
roger.rafanell@letgo.com
https://www.linkedin.com/in/rogerrafanell
https://we.letgo.com/careers

More Related Content

What's hot

Visualizing large datasets with elasticsearch and kibana
Visualizing large datasets with elasticsearch and kibanaVisualizing large datasets with elasticsearch and kibana
Visualizing large datasets with elasticsearch and kibana
Dan Fey
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Value Association
 
Using R in power BI
Using R in power BIUsing R in power BI
Using R in power BI
Guruprasad Vijayarao
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
✔ Eric David Benari, PMP
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
Albert Wong
 
RIPE Atlas
RIPE AtlasRIPE Atlas
RIPE Atlas
RIPE NCC
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
✔ Eric David Benari, PMP
 
Life is but a Stream
Life is but a StreamLife is but a Stream
Life is but a Stream
Databricks
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Spark Summit
 
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
Zhijie Shen
 
R training at Aimia
R training at AimiaR training at Aimia
R training at Aimia
Ali Arsalan Kazmi
 
Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016
Jen Stirrup
 
Real time ads personalization @ Spotify
Real time ads personalization @ SpotifyReal time ads personalization @ Spotify
Real time ads personalization @ Spotify
Kinshuk Mishra
 
Bisp list of courses
Bisp list of coursesBisp list of courses
Bisp list of courses
Amit Sharma
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs
Objectivity
 
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksNotebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Michelle Ufford
 
Zipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering FrameworkZipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering Framework
Databricks
 
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Khai Tran
 
DC Web API Meetup Oct 4 2016
DC Web API Meetup Oct 4 2016DC Web API Meetup Oct 4 2016

What's hot (20)

Visualizing large datasets with elasticsearch and kibana
Visualizing large datasets with elasticsearch and kibanaVisualizing large datasets with elasticsearch and kibana
Visualizing large datasets with elasticsearch and kibana
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Using R in power BI
Using R in power BIUsing R in power BI
Using R in power BI
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
 
RIPE Atlas
RIPE AtlasRIPE Atlas
RIPE Atlas
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
 
Life is but a Stream
Life is but a StreamLife is but a Stream
Life is but a Stream
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
 
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
 
R training at Aimia
R training at AimiaR training at Aimia
R training at Aimia
 
Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016
 
Real time ads personalization @ Spotify
Real time ads personalization @ SpotifyReal time ads personalization @ Spotify
Real time ads personalization @ Spotify
 
Bisp list of courses
Bisp list of coursesBisp list of courses
Bisp list of courses
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs
 
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksNotebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
 
Zipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering FrameworkZipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering Framework
 
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
Beam summit 2019 - Unifying Batch and Stream Data Processing with Apache Calc...
 
DC Web API Meetup Oct 4 2016
DC Web API Meetup Oct 4 2016DC Web API Meetup Oct 4 2016
DC Web API Meetup Oct 4 2016
 

Similar to Activate 2019 - Search and relevance at scale for online classifieds

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
Ana Rebelo
 
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
Amazon Web Services
 
Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEM
rtpaem
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu
 
Filipe paternot - Case Study: Zabbix Deployment at Globo.com
Filipe paternot - Case Study: Zabbix Deployment at Globo.comFilipe paternot - Case Study: Zabbix Deployment at Globo.com
Filipe paternot - Case Study: Zabbix Deployment at Globo.com
Zabbix
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Tech Triveni
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
maxonlinetr
 
Le big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entrepriseLe big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entreprise
Rubedo, a WebTales solution
 
Graphs in Action: In-depth look at Neo4j in Production
Graphs in Action: In-depth look at Neo4j in ProductionGraphs in Action: In-depth look at Neo4j in Production
Graphs in Action: In-depth look at Neo4j in Production
Neo4j
 
Fast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsFast, Powerful and Scalable Analytics
Fast, Powerful and Scalable Analytics
MariaDB plc
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Spark Summit
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Lucidworks
 
Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks
Nishant Gandhi
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014
ALTER WAY
 
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the FieldPartner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Denodo
 
Netflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case StudyNetflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case Study
Ketan Patil
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
lucenerevolution
 
Building the Global Open Knowledgebase (ER&L 2013)
Building the Global Open Knowledgebase (ER&L 2013)Building the Global Open Knowledgebase (ER&L 2013)
Building the Global Open Knowledgebase (ER&L 2013)
GOKb Project
 

Similar to Activate 2019 - Search and relevance at scale for online classifieds (20)

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
 
Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEM
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Filipe paternot - Case Study: Zabbix Deployment at Globo.com
Filipe paternot - Case Study: Zabbix Deployment at Globo.comFilipe paternot - Case Study: Zabbix Deployment at Globo.com
Filipe paternot - Case Study: Zabbix Deployment at Globo.com
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
Le big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entrepriseLe big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entreprise
 
Graphs in Action: In-depth look at Neo4j in Production
Graphs in Action: In-depth look at Neo4j in ProductionGraphs in Action: In-depth look at Neo4j in Production
Graphs in Action: In-depth look at Neo4j in Production
 
Fast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsFast, Powerful and Scalable Analytics
Fast, Powerful and Scalable Analytics
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014
 
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the FieldPartner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
 
Netflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case StudyNetflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case Study
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Building the Global Open Knowledgebase (ER&L 2013)
Building the Global Open Knowledgebase (ER&L 2013)Building the Global Open Knowledgebase (ER&L 2013)
Building the Global Open Knowledgebase (ER&L 2013)
 

More from Roger Rafanell Mas

How to build a self-service data platform and what it can do for your business?
How to build a self-service data platform and what it can do for your business?How to build a self-service data platform and what it can do for your business?
How to build a self-service data platform and what it can do for your business?
Roger Rafanell Mas
 
Pensamiento lateral
Pensamiento lateralPensamiento lateral
Pensamiento lateral
Roger Rafanell Mas
 
Storm distributed cache workshop
Storm distributed cache workshopStorm distributed cache workshop
Storm distributed cache workshop
Roger Rafanell Mas
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
Roger Rafanell Mas
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
Roger Rafanell Mas
 
MRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud ComputingMRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud Computing
Roger Rafanell Mas
 
SDS Amazon RDS
SDS Amazon RDSSDS Amazon RDS
SDS Amazon RDS
Roger Rafanell Mas
 
EEDC Programming Models
EEDC Programming ModelsEEDC Programming Models
EEDC Programming Models
Roger Rafanell Mas
 
EEDC Intelligent Placement of Datacenters
EEDC Intelligent Placement of DatacentersEEDC Intelligent Placement of Datacenters
EEDC Intelligent Placement of Datacenters
Roger Rafanell Mas
 
EEDC Everthing as a Service
EEDC Everthing as a ServiceEEDC Everthing as a Service
EEDC Everthing as a Service
Roger Rafanell Mas
 
EEDC Apache Pig Language
EEDC Apache Pig LanguageEEDC Apache Pig Language
EEDC Apache Pig Language
Roger Rafanell Mas
 
EEDC Distributed Systems
EEDC Distributed SystemsEEDC Distributed Systems
EEDC Distributed Systems
Roger Rafanell Mas
 
EEDC SOAP vs REST
EEDC SOAP vs RESTEEDC SOAP vs REST
EEDC SOAP vs REST
Roger Rafanell Mas
 

More from Roger Rafanell Mas (13)

How to build a self-service data platform and what it can do for your business?
How to build a self-service data platform and what it can do for your business?How to build a self-service data platform and what it can do for your business?
How to build a self-service data platform and what it can do for your business?
 
Pensamiento lateral
Pensamiento lateralPensamiento lateral
Pensamiento lateral
 
Storm distributed cache workshop
Storm distributed cache workshopStorm distributed cache workshop
Storm distributed cache workshop
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
 
MRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud ComputingMRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud Computing
 
SDS Amazon RDS
SDS Amazon RDSSDS Amazon RDS
SDS Amazon RDS
 
EEDC Programming Models
EEDC Programming ModelsEEDC Programming Models
EEDC Programming Models
 
EEDC Intelligent Placement of Datacenters
EEDC Intelligent Placement of DatacentersEEDC Intelligent Placement of Datacenters
EEDC Intelligent Placement of Datacenters
 
EEDC Everthing as a Service
EEDC Everthing as a ServiceEEDC Everthing as a Service
EEDC Everthing as a Service
 
EEDC Apache Pig Language
EEDC Apache Pig LanguageEEDC Apache Pig Language
EEDC Apache Pig Language
 
EEDC Distributed Systems
EEDC Distributed SystemsEEDC Distributed Systems
EEDC Distributed Systems
 
EEDC SOAP vs REST
EEDC SOAP vs RESTEEDC SOAP vs REST
EEDC SOAP vs REST
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
kalichargn70th171
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 

Activate 2019 - Search and relevance at scale for online classifieds

  • 1. SEARCH AND RELEVANCE AT SCALE FOR ONLINE CLASSIFIEDS
  • 2. STAY CONNECTED Twitter @activate_conf Facebook @activateconf #Activate19 Log in to wifi, follow Activate on social media, and download the event app where you can submit an evaluation after the session WIFI NETWORK: Activate2019 PASSWORD: Lucidworks DOWNLOAD THE ACTIVATE 2019 MOBILE APP Search Activate2019 in the App/Play store Or visit: http://crowd.cc/activate19
  • 3. ROGER RAFANELL Senior Big Data Engineer | letgo About me
  • 4. ?
  • 5. letgo • Second-hand marketplace app • Founded 2015 • Main markets: US & Turkey • 5M downloads/month & 20M MAU
  • 6. Agenda • Introduction to search in classifieds • search in the past • Building a new search platform at scale • Enabling data science • The future of search platform
  • 8. Introduction Search in classifieds 100K items/day Reposted We cannot cache results!!! Sold / Deleted
  • 9. Introduction Hyperlocality search letgo catalog at (39.89,-77.08) letgo catalog at (39.28, -76.69) Results in Washington, D.C. ≠ Results in Baltimore
  • 10. Introduction User input data example • Typos • Slang • Poor pictures • Wrong information • Ambiguity • Weirdness
  • 11. Introduction Explorers vs Deal Seekers (Marco Polo vs Hernán Cortés) • Like browsing • Recall > Precision • “Cars” • Search, filter, haggle • Precision > Recall • “2015 honda civic lx”
  • 13. Search in the past Early 2015 Listings API Search API
  • 14. Search in the past Late 2017 Shard 1 ... 8 replicas / shard x 3 Shard 1 Shard 5 ... 8 replicas / shard x 2 x 1 (↑nodes) (↑nodes) (↑nodes) Listings API Search API Shard 5
  • 15. 24hFULL INDEXATION 8hINSTANCES RECOVERY 150msRESPONSE TIME Operation limitations • Slow full catalog imports • Slow reactivity to traffic spikes • High costs
  • 16. Business limitations • No enrichment at import time • Not easy to evolve schemas • Not agile! NOA/B TESTING NODATA SCIENCE
  • 17. Search API limitations • One API request -> One search query • PHP + Solarium (↓ concurrency) • High costs Search API 200rpsTHROUGHPUT 400msRESPONSE TIME 60+SERVICE INSTANCES
  • 18. The platform was not scaling
  • 20. • Spot oldest queries sent by search API • ↑Traffic for fresh listings • All fields were stored Building a new search platform Analysis 3 monthsCATALOG RETENTION 15 minHIGHLY REQUESTED LISTINGS
  • 21. • Keep only the last 3 month listings • Index only the queried fields • Store only listings IDs Building a new search platform Looking for a strategy >100GBOLD CATALOG SIZE <4GBNEW CATALOG SIZE
  • 22. Solr was used as a key-value storage NOT as a full-text search engine
  • 23. Building a new search platform THE BAD • Where to store all listings fields? • Need a catalog storage (database) • Need also a fast serving layer • Near real-time indexing constraints THE GOOD • No more sharding (↓index size) • Standalone Solr instances • High bump in performance Drawing a plan
  • 24. Building a new search platform Big Data to the rescue • NRT pipeline to keep the listings catalog up-to-date • Batch pipeline to fully rebuild the catalog
  • 25. Building a new search platform The new architecture Self-healing
  • 26. Building a new search platform The Search indexer ETL Fetch Listings Enrich Listings Fetch Verticals Features Normalize Attributes Anonymize PII Store to DB Store to Fast Layer
  • 27. Building a new search platform Search engine performance Throughput Recovery time Latency ↑12x ↓8x12’
  • 28. Building a new search platform Catalog performance Catalog (Fast layer) Catalog (Database) Worst Case Latency 16ms 56ms40ms
  • 29. Gluing all the pieces
  • 30. Building a new search platform Search API redesign x 1 x 1 x 1 Listings API Search Library Search API IDs
  • 31. Building a new search platform Search library - Scala to rule them all • Wrap the search retrieval logic • One request → Multiple parallel queries to Solr • Non-blocking I/O with solrS, persistence drivers • Seamless integration with Finagle framework
  • 32. Building a new search platform Search API - Scala to rule them all • Based on Finagle services framework • Finatra/Finagle = ↑concurrency & ↓resources • Enable backend driven A/B testing • Personalized search
  • 33. Building a new search platform Overall performance ↑Throughput &↓ Latency Resources Cost Reduction 13x 100x↓20x
  • 35. Enabling data science Unlocked data science projects • Recall – Query expansion • Precision – Learning to Rank
  • 36. Enabling data science Improving recall - Query expansion Searching for: ‘mountain bike’ blue mountain bicycle → Synonyms mountain and road bike → OK mountain bike frame → Relevant? bicicleta de montaña → Language scout montain bike → Spelling mountain bike lock → Relevant? Similar Queries Cause blue mountain bicycle mountain and road bike mountain bike frame bicicleta de montaña scout montain bike mountain bike lock Expected Behavior
  • 37. Enabling data science Improving precision - Learning to Rank ‘mountain bike’ Items retrieved
  • 38. Enabling data science ‘mountain bike’ 2 months ago 30 miles Improving precision - Learning to Rank
  • 39. Enabling data science Improving precision - Learning to Rank
  • 40. Enabling data science bike 1 2 3 4 5 6 Improving precision - Learning to Rank y = 0 y = 0 y = 1
  • 41. Enabling data science Conversions on query ‘bike’ Improving precision - Learning to Rank
  • 42. Enabling data science Before After SEARCH CONVERSIONS Improving precision - Learning to Rank
  • 43. Text score = indicator of relevance. Freshness and distance are key!
  • 45. Future of search platform Work in progress • Migration to Solr 8 (↓latency & better security) • Iterate Learning to Rank • Real-time personalization • Visual categorization (Reveal)
  • 46.
  • 48. Conclusions Raising the bar • Indexer pipeline enables data enrichment & transformations • Simplified search architecture with lightweight in-memory indices • Fault-tolerant and self-healing infrastructure and processes • Unlock real data science in