SlideShare a Scribd company logo
1 ZILLOW | TRULIA | STREETEASY | HOTPADS
ZILLOW | TRULIA | STREETEASY | HOTPADS
December 14, 2016
Shruti Kamath (shrutik) & Nicholas Stevens (nicholass)
Recommendations at Zillow
2
Motivation
Modeling
Infrastructure
Future work
3
Zillow Group’s mission stretches across brands
Build the world's largest, most trusted and vibrant home-related marketplace.
4
160+ mm
monthly UU’s
5
110+ mm
US Homes
6
Goal:
Help all of our users
seamlessly, efficiently & delightfully discover
homes
that match her family’s needs & lifestyle.
7
Tons of Zillow product areas for recommendations
● Communication Channels
○ Email
○ SMS
○ Push
● Home Details Pages
○ Homes like this
● Search
○ Personalized Search
● Home Page
● Beyond home recommendations
○ Content Recommendations
○ Mortgages
○ Agents
8
Tons of Zillow product areas for recommendations
● Communication Channels
○ Email
○ SMS
○ Push
● Home Details Pages
○ Homes like this
● Search
○ Personalized Search
● Home Page
● Beyond home recommendations
○ Content Recommendations
○ Mortgages
○ Agents
9
Why now?
important user flows
+
user and home data
+
proven algorithms
10
Motivation
Modeling
Infrastructure
Future work
11
We want to test & validate models quickly
Offline Testing
Online A/B
Testing 100%
Data set
Evaluation Metric
Models
Infrastructure
Metrics
12
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
13
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
● User Interactions (Clicks, saves, photo views, etc)
● Home Details (Price, Location, beds, etc)
14
The shape of Zillow’s User-Item matrix is unique
Users
Items
Traditional
15
The shape of Zillow’s User-Item matrix is unique
Users
Traditional
Zillow
Items
16
The shape of Zillow’s User-Item matrix is unique
Users
Traditional
Items
17
The shape of Zillow’s User-Item matrix is unique
Users
Zillow
Items
18
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
Was this relevant?
19
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
20
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
Feature Description
uid unique id of user
pid Property id
first_visit timestamp or 0
num_views sigmoid(#views)
time_spent time on page
num_contacts # leads sent
num_saves # saves on zpid
num_shares # shares on zpid
num_photos # photos viewed
21
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
Users
Homes
Relevant/Not Relevant
[0...1]
22
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
23
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
● Similar user preferences
● Needs: user & items interaction
● Doesn’t need: domain knowledge
24
Collaborative Filtering example
USERAUSERB
CLICK HISTORY (or other signals of interest) MACHINE SUGGESTIONS
25
Many options to test collaborative filtering
● Open Source:
○ Spark’s collab filtering via ALS (spark.mllib.recommendation).
○ LensKit: Java (documentation)
○ PredictionIO: Spark, HBase and Spray. Acquired by SalesForce (site)
○ Seldon: Java based, built on Spark (site)
○ GraphLab: SDK available in Java, Python, C++, etc (guide)
● Recs as a Service: Amazon ML, GraphLab, Google ML, Azure, etc
● Algorithms from papers:
○ Restricted Boltzman Machines & SVD++ (what Netflix was using in 2014)
○ Trulia’s Wedge Counting
26
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
● Similar user preferences
● Needs: user & items interaction
● Doesn’t need: domain knowledge
27
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
● Specific home features
● Needs: domain knowledge
● Doesn’t need: interaction
28
Content-based modeling example
CATALOG OF TAGGED ITEMSSHOWED INTEREST MACHINE SUGGESTIONS
Fruit, Berry, Red Fruit, Berry, Red
29
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
Type Features (categorical variables)
Bath 0_bath, 0.5_bath, 1_Bath,
1.5_bath, 2_bath, 2.5_bath,
3_bath
Bed 0_bed, 1_bed, 2_bed, 3_bed,
4_bed, 5_bed
Price 100_125_price, 125_150_price,
150_175_price
Use
Code
condo, single_family, farm_land
Zipcode zip_98109
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
● Types of properties
● Stage of buying (early
vs late)
● Region differences
● Communication
preferences
31
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
32
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
Email: New Rec’s! (A)
Home1
ModelX
Home2
ModelX
Home3
ModelX
Email: New Rec’s! (B)
Home1
ModelY
Home2
ModelY
Home3
ModelY
v
33
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
Email: New Rec’s!
Home1
ModelX
Home3
ModelX
Email: New Rec’s!
Home1
ModelY
Home2
ModelY
Home3
ModelY
v Home2
ModelX
34
Background on Interleaving
● Blog introduction to interleaving, 2008:
http://glinden.blogspot.com/2008/11/testing-rankers-by-interleaving-search.html
● Paper comparing interleaving to other, more traditional methods (2010):
https://www.microsoft.com/en-us/research/wp-content/uploads/2010/07/fp146-radlinski.pdf
○ 5,000 judged queries ~= 50,000 user impressions.
● Paper with a comprehensive review of interleaving in 2012:
http://www.cs.cornell.edu/people/tj/publications/chapelle_etal_12a.pdf
● Comparison of specific interleaving algorithms (click aggregation schemes,
2013): https://www.microsoft.com/en-us/research/wp-content/uploads/2013/02/Radlinski_Optimized_WSDM2013.pdf.pdf
● Big guide on evaluation techniques, including interleaving:
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/ftir-online-evaluation-final-journal.pdf
35
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
Email: New Rec’s!
Home1
ModelX
Home3
ModelZ
Email: New Rec’s!
Home1
ModelZ
Home2
ModelY
Home3
ModelY
v Home2
ModelX
36
Components of Home Rec’s
Data set Relevant/Not Relevant Evaluation Metric
Collaborative Filtering Content-based Other
A/B Infrastructure Serving Infrastructure Metrics
Interleaving
37
Motivation
Modeling
Infrastructure
Future work
Home Recommendations architecture
RECOMMENDATION API
(Java)
Zillow Group
Data Lake
(S3 / Kinesis)
Property Featurization
(Spark EMR)
User Profiles
(Spark EMR)
Ranking
(Spark EMR)
Wedge Counting
Collaborative Filtering
(Spark EMR)
Property Aggregate Features
(Spark EMR)
Data Collection Systems
(Java/Python/SQL)
39
Airflow
40
Machine Learning pipeline
41
Machine Learning pipeline - ETL
42
Our top data sources
Data Source Volume per day Location Frequency
User-Property
Interactions
Google Analytics
& Kinesis stream
1 TB S3 Batch &
Realtime
Property Third-Party,
County, Banks,
Users, Listings
12 GB S3 Realtime
Posting History MLS, Brokers,
Listings
12 GB S3 Realtime
43
Data processing timestamp
Google
Analytics
User
Events
training validation test
Aggregated
user events
property
Subset
1
Subset
n
Filter subset
1 n 1 n
uid
zpid
timestamp
price
device
event
event type
event label
.
.
.
uid
zpid
dwell time
num_shares
.
.
uid
zpid
dwell time
...
zipid
countyid
...
44
Subset 1 Subset n
Filter region
uid
zpid
dwell time
...
zipid
countyid
...
Filter zuids, zpids
Subset 1 Subset n
Training
data
&
Subset 1 Subset n
Prediction
data
Data processing
45
Scala is a Spark first class citizen
● Functional
○ Composability (building blocks)
○ Easy to Parallelize
● Type Safe
● Ship features and improvements quickly
● Deployment of code through the JVM is easier
● Using Datasets API
○ Compile-time safety for syntax and analysis errors
○ Structured Streaming
46
Datasets (Spark 2.0) is very promising
Pros:
● Compile time safety.
● Domain specific operations
● Composable (output of one process is
the input into another).
● Uses Catalyst Optimizer
● Structured Streaming
Cons:
● Dataset API’s are still in experimental
phase
● Only supported for primitive types and
case classes.
● Joins are not type-safe.
● Complex types such as Iterators are
not supported in the current version
and require custom encoders
47
We still use RDDs where Dataset is challenging
Dataset
RDD (Resilient Distributed Dataset)
48
Machine Learning pipeline - Training
49
Machine Learning pipeline - Training
50
Training
Load
subset
Load
Subset
51
Machine Learning pipeline - Prediction
52
Machine Learning pipeline - Prediction
53
Prediction
Load
prediction
data
54
Machine Learning pipeline - Evaluation
55
Machine Learning pipeline
56
Offline Metrics Description
Precision rk
= # recommended properties in test set in top k
Recall n = total properties in the test set
Freshness # of listings recommended with modified date < d days old in top k
Recs Coverage # unique listings recommended across all users in the top k
Our top model evaluation metrics
57
Motivation
Modeling
Infrastructure
Future work
58
Other problems we’re tackling
● Classifiers for listing descriptions
● Deep learning on listing images
● Structured streaming on Spark 2.0
● Real-time scoring
We’re hiring for Data Scientists & Engineers! zillow.com/jobs
Questions? shrutik@zillow & nicholass@zillow
59
● Imri S
● Alex C
● Shruti K
● Nicholas S
Recommendations team
Data Science
● Hao X
● Purvag P
● Kurtis M
● XiaoXi G
● Ming-Li L
● Jason C
● Andy W
● Eric T
● Andrew M
And others!
Personalization Product team
60
That’s it!

More Related Content

What's hot

Splunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gxSplunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gx
Damien Dallimore
 
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightThe Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
Databricks
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
Tobias Lindaaker
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
Timothy McAliley
 
Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4 Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4
TigerGraph
 
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4jAdobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Neo4j
 
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & Management
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & ManagementAstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & Management
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & Management
Neo4j
 
Fraud Detection and Neo4j
Fraud Detection and Neo4j Fraud Detection and Neo4j
Fraud Detection and Neo4j
Max De Marzi
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
Data Warehousing using Hadoop
Data Warehousing using HadoopData Warehousing using Hadoop
Data Warehousing using Hadoop
DataWorks Summit
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
Tobias Lindaaker
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
Guido Schmutz
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
Databricks
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDesigning the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Databricks
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Databricks
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
Nilesh Gule
 

What's hot (20)

Splunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gxSplunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gx
 
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightThe Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
 
Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4 Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4
 
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4jAdobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
 
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & Management
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & ManagementAstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & Management
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & Management
 
Fraud Detection and Neo4j
Fraud Detection and Neo4j Fraud Detection and Neo4j
Fraud Detection and Neo4j
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Data Warehousing using Hadoop
Data Warehousing using HadoopData Warehousing using Hadoop
Data Warehousing using Hadoop
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scale
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDesigning the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
 

Viewers also liked

Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
Nicholas McClure
 
Filtros Colaborativos y Sistemas de Recomendación
Filtros Colaborativos y Sistemas de RecomendaciónFiltros Colaborativos y Sistemas de Recomendación
Filtros Colaborativos y Sistemas de Recomendación
Gabriel Huecas
 
Zillow Premier Agent
Zillow Premier AgentZillow Premier Agent
Zillow Premier Agent
Brad Andersohn
 
Small Data: a Brief History and a New Design Philosophy
Small Data: a Brief History and a New Design PhilosophySmall Data: a Brief History and a New Design Philosophy
Small Data: a Brief History and a New Design Philosophy
Allen Bonde
 
What We Thought We Knew: Surprising Truths About Buyers and Sellers
What We Thought We Knew: Surprising Truths About Buyers and SellersWhat We Thought We Knew: Surprising Truths About Buyers and Sellers
What We Thought We Knew: Surprising Truths About Buyers and Sellers
Premier Agent | Zillow & Trulia
 
Leveraging Advertising And Technology To Scale Your Business
Leveraging Advertising And Technology To Scale Your BusinessLeveraging Advertising And Technology To Scale Your Business
Leveraging Advertising And Technology To Scale Your Business
Premier Agent | Zillow & Trulia
 
Staying Ahead in a World of Change
Staying Ahead in a World of ChangeStaying Ahead in a World of Change
Staying Ahead in a World of Change
Premier Agent | Zillow & Trulia
 
Database Design
Database DesignDatabase Design
Database Design
sariatiazman
 
Why Propertybase [WEBINAR]
Why Propertybase [WEBINAR]Why Propertybase [WEBINAR]
Why Propertybase [WEBINAR]
Propertybase
 
Real estate mediume
Real estate mediume Real estate mediume
Real estate mediume
ITAAKASH STRATEGIC
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processing
Yogi Devendra Vyavahare
 
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
Amazon Web Services
 
Using Docker for GPU Accelerated Applications
Using Docker for GPU Accelerated ApplicationsUsing Docker for GPU Accelerated Applications
Using Docker for GPU Accelerated Applications
NVIDIA
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
Seth Familian
 

Viewers also liked (15)

Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Filtros Colaborativos y Sistemas de Recomendación
Filtros Colaborativos y Sistemas de RecomendaciónFiltros Colaborativos y Sistemas de Recomendación
Filtros Colaborativos y Sistemas de Recomendación
 
Zillow Premier Agent
Zillow Premier AgentZillow Premier Agent
Zillow Premier Agent
 
Small Data: a Brief History and a New Design Philosophy
Small Data: a Brief History and a New Design PhilosophySmall Data: a Brief History and a New Design Philosophy
Small Data: a Brief History and a New Design Philosophy
 
What We Thought We Knew: Surprising Truths About Buyers and Sellers
What We Thought We Knew: Surprising Truths About Buyers and SellersWhat We Thought We Knew: Surprising Truths About Buyers and Sellers
What We Thought We Knew: Surprising Truths About Buyers and Sellers
 
Leveraging Advertising And Technology To Scale Your Business
Leveraging Advertising And Technology To Scale Your BusinessLeveraging Advertising And Technology To Scale Your Business
Leveraging Advertising And Technology To Scale Your Business
 
Staying Ahead in a World of Change
Staying Ahead in a World of ChangeStaying Ahead in a World of Change
Staying Ahead in a World of Change
 
Database Design
Database DesignDatabase Design
Database Design
 
Why Propertybase [WEBINAR]
Why Propertybase [WEBINAR]Why Propertybase [WEBINAR]
Why Propertybase [WEBINAR]
 
Real estate mediume
Real estate mediume Real estate mediume
Real estate mediume
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processing
 
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
 
Using Docker for GPU Accelerated Applications
Using Docker for GPU Accelerated ApplicationsUsing Docker for GPU Accelerated Applications
Using Docker for GPU Accelerated Applications
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 

Similar to Recommendations at Zillow

DB design
DB designDB design
DB design
fikirabc
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
Krishnaram Kenthapadi
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
Bala Iyer
 
Data Science, Personalisation & Product management
Data Science, Personalisation & Product managementData Science, Personalisation & Product management
Data Science, Personalisation & Product management
Bhaskar Krishnan
 
Data Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceData Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-Service
DATAVERSITY
 
Presentasi 1 - Business Intelligence
Presentasi 1 - Business IntelligencePresentasi 1 - Business Intelligence
Presentasi 1 - Business Intelligence
DEDE IRYAWAN
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
markgrover
 
Liggett Methods And Tools Slides Q1 2011
Liggett Methods And Tools Slides Q1 2011Liggett Methods And Tools Slides Q1 2011
Liggett Methods And Tools Slides Q1 2011
tliggett
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo
 
Knowledge Graphs Webinar- 11/7/2017
Knowledge Graphs Webinar- 11/7/2017Knowledge Graphs Webinar- 11/7/2017
Knowledge Graphs Webinar- 11/7/2017
Neo4j
 
Supercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX AnalyticsSupercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX Analytics
UserZoom
 
Power Up Competitive Price Intelligence with Web Data
Power Up Competitive Price Intelligence with Web DataPower Up Competitive Price Intelligence with Web Data
Power Up Competitive Price Intelligence with Web Data
Connotate
 
Focused Crawling for Structured Data
Focused Crawling for Structured DataFocused Crawling for Structured Data
Focused Crawling for Structured Data
Robert Meusel
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
Caserta
 
Group 3 slide presentation
Group 3 slide presentationGroup 3 slide presentation
Group 3 slide presentation
Michael Young
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
Arvind Sathi
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Denodo
 
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Patrick Van Renterghem
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
Aravindharamanan S
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j
 

Similar to Recommendations at Zillow (20)

DB design
DB designDB design
DB design
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 
Data Science, Personalisation & Product management
Data Science, Personalisation & Product managementData Science, Personalisation & Product management
Data Science, Personalisation & Product management
 
Data Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceData Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-Service
 
Presentasi 1 - Business Intelligence
Presentasi 1 - Business IntelligencePresentasi 1 - Business Intelligence
Presentasi 1 - Business Intelligence
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Liggett Methods And Tools Slides Q1 2011
Liggett Methods And Tools Slides Q1 2011Liggett Methods And Tools Slides Q1 2011
Liggett Methods And Tools Slides Q1 2011
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
 
Knowledge Graphs Webinar- 11/7/2017
Knowledge Graphs Webinar- 11/7/2017Knowledge Graphs Webinar- 11/7/2017
Knowledge Graphs Webinar- 11/7/2017
 
Supercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX AnalyticsSupercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX Analytics
 
Power Up Competitive Price Intelligence with Web Data
Power Up Competitive Price Intelligence with Web DataPower Up Competitive Price Intelligence with Web Data
Power Up Competitive Price Intelligence with Web Data
 
Focused Crawling for Structured Data
Focused Crawling for Structured DataFocused Crawling for Structured Data
Focused Crawling for Structured Data
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Group 3 slide presentation
Group 3 slide presentationGroup 3 slide presentation
Group 3 slide presentation
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
 
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time Analytics
 

Recently uploaded

A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 

Recently uploaded (20)

A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 

Recommendations at Zillow

  • 1. 1 ZILLOW | TRULIA | STREETEASY | HOTPADS ZILLOW | TRULIA | STREETEASY | HOTPADS December 14, 2016 Shruti Kamath (shrutik) & Nicholas Stevens (nicholass) Recommendations at Zillow
  • 3. 3 Zillow Group’s mission stretches across brands Build the world's largest, most trusted and vibrant home-related marketplace.
  • 6. 6 Goal: Help all of our users seamlessly, efficiently & delightfully discover homes that match her family’s needs & lifestyle.
  • 7. 7 Tons of Zillow product areas for recommendations ● Communication Channels ○ Email ○ SMS ○ Push ● Home Details Pages ○ Homes like this ● Search ○ Personalized Search ● Home Page ● Beyond home recommendations ○ Content Recommendations ○ Mortgages ○ Agents
  • 8. 8 Tons of Zillow product areas for recommendations ● Communication Channels ○ Email ○ SMS ○ Push ● Home Details Pages ○ Homes like this ● Search ○ Personalized Search ● Home Page ● Beyond home recommendations ○ Content Recommendations ○ Mortgages ○ Agents
  • 9. 9 Why now? important user flows + user and home data + proven algorithms
  • 11. 11 We want to test & validate models quickly Offline Testing Online A/B Testing 100% Data set Evaluation Metric Models Infrastructure Metrics
  • 12. 12 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving
  • 13. 13 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving ● User Interactions (Clicks, saves, photo views, etc) ● Home Details (Price, Location, beds, etc)
  • 14. 14 The shape of Zillow’s User-Item matrix is unique Users Items Traditional
  • 15. 15 The shape of Zillow’s User-Item matrix is unique Users Traditional Zillow Items
  • 16. 16 The shape of Zillow’s User-Item matrix is unique Users Traditional Items
  • 17. 17 The shape of Zillow’s User-Item matrix is unique Users Zillow Items
  • 18. 18 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving Was this relevant?
  • 19. 19 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving
  • 20. 20 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving Feature Description uid unique id of user pid Property id first_visit timestamp or 0 num_views sigmoid(#views) time_spent time on page num_contacts # leads sent num_saves # saves on zpid num_shares # shares on zpid num_photos # photos viewed
  • 21. 21 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving Users Homes Relevant/Not Relevant [0...1]
  • 22. 22 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving
  • 23. 23 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving ● Similar user preferences ● Needs: user & items interaction ● Doesn’t need: domain knowledge
  • 24. 24 Collaborative Filtering example USERAUSERB CLICK HISTORY (or other signals of interest) MACHINE SUGGESTIONS
  • 25. 25 Many options to test collaborative filtering ● Open Source: ○ Spark’s collab filtering via ALS (spark.mllib.recommendation). ○ LensKit: Java (documentation) ○ PredictionIO: Spark, HBase and Spray. Acquired by SalesForce (site) ○ Seldon: Java based, built on Spark (site) ○ GraphLab: SDK available in Java, Python, C++, etc (guide) ● Recs as a Service: Amazon ML, GraphLab, Google ML, Azure, etc ● Algorithms from papers: ○ Restricted Boltzman Machines & SVD++ (what Netflix was using in 2014) ○ Trulia’s Wedge Counting
  • 26. 26 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving ● Similar user preferences ● Needs: user & items interaction ● Doesn’t need: domain knowledge
  • 27. 27 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving ● Specific home features ● Needs: domain knowledge ● Doesn’t need: interaction
  • 28. 28 Content-based modeling example CATALOG OF TAGGED ITEMSSHOWED INTEREST MACHINE SUGGESTIONS Fruit, Berry, Red Fruit, Berry, Red
  • 29. 29 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving Type Features (categorical variables) Bath 0_bath, 0.5_bath, 1_Bath, 1.5_bath, 2_bath, 2.5_bath, 3_bath Bed 0_bed, 1_bed, 2_bed, 3_bed, 4_bed, 5_bed Price 100_125_price, 125_150_price, 150_175_price Use Code condo, single_family, farm_land Zipcode zip_98109
  • 30. Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving ● Types of properties ● Stage of buying (early vs late) ● Region differences ● Communication preferences
  • 31. 31 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving
  • 32. 32 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving Email: New Rec’s! (A) Home1 ModelX Home2 ModelX Home3 ModelX Email: New Rec’s! (B) Home1 ModelY Home2 ModelY Home3 ModelY v
  • 33. 33 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving Email: New Rec’s! Home1 ModelX Home3 ModelX Email: New Rec’s! Home1 ModelY Home2 ModelY Home3 ModelY v Home2 ModelX
  • 34. 34 Background on Interleaving ● Blog introduction to interleaving, 2008: http://glinden.blogspot.com/2008/11/testing-rankers-by-interleaving-search.html ● Paper comparing interleaving to other, more traditional methods (2010): https://www.microsoft.com/en-us/research/wp-content/uploads/2010/07/fp146-radlinski.pdf ○ 5,000 judged queries ~= 50,000 user impressions. ● Paper with a comprehensive review of interleaving in 2012: http://www.cs.cornell.edu/people/tj/publications/chapelle_etal_12a.pdf ● Comparison of specific interleaving algorithms (click aggregation schemes, 2013): https://www.microsoft.com/en-us/research/wp-content/uploads/2013/02/Radlinski_Optimized_WSDM2013.pdf.pdf ● Big guide on evaluation techniques, including interleaving: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/ftir-online-evaluation-final-journal.pdf
  • 35. 35 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving Email: New Rec’s! Home1 ModelX Home3 ModelZ Email: New Rec’s! Home1 ModelZ Home2 ModelY Home3 ModelY v Home2 ModelX
  • 36. 36 Components of Home Rec’s Data set Relevant/Not Relevant Evaluation Metric Collaborative Filtering Content-based Other A/B Infrastructure Serving Infrastructure Metrics Interleaving
  • 38. Home Recommendations architecture RECOMMENDATION API (Java) Zillow Group Data Lake (S3 / Kinesis) Property Featurization (Spark EMR) User Profiles (Spark EMR) Ranking (Spark EMR) Wedge Counting Collaborative Filtering (Spark EMR) Property Aggregate Features (Spark EMR) Data Collection Systems (Java/Python/SQL)
  • 42. 42 Our top data sources Data Source Volume per day Location Frequency User-Property Interactions Google Analytics & Kinesis stream 1 TB S3 Batch & Realtime Property Third-Party, County, Banks, Users, Listings 12 GB S3 Realtime Posting History MLS, Brokers, Listings 12 GB S3 Realtime
  • 43. 43 Data processing timestamp Google Analytics User Events training validation test Aggregated user events property Subset 1 Subset n Filter subset 1 n 1 n uid zpid timestamp price device event event type event label . . . uid zpid dwell time num_shares . . uid zpid dwell time ... zipid countyid ...
  • 44. 44 Subset 1 Subset n Filter region uid zpid dwell time ... zipid countyid ... Filter zuids, zpids Subset 1 Subset n Training data & Subset 1 Subset n Prediction data Data processing
  • 45. 45 Scala is a Spark first class citizen ● Functional ○ Composability (building blocks) ○ Easy to Parallelize ● Type Safe ● Ship features and improvements quickly ● Deployment of code through the JVM is easier ● Using Datasets API ○ Compile-time safety for syntax and analysis errors ○ Structured Streaming
  • 46. 46 Datasets (Spark 2.0) is very promising Pros: ● Compile time safety. ● Domain specific operations ● Composable (output of one process is the input into another). ● Uses Catalyst Optimizer ● Structured Streaming Cons: ● Dataset API’s are still in experimental phase ● Only supported for primitive types and case classes. ● Joins are not type-safe. ● Complex types such as Iterators are not supported in the current version and require custom encoders
  • 47. 47 We still use RDDs where Dataset is challenging Dataset RDD (Resilient Distributed Dataset)
  • 56. 56 Offline Metrics Description Precision rk = # recommended properties in test set in top k Recall n = total properties in the test set Freshness # of listings recommended with modified date < d days old in top k Recs Coverage # unique listings recommended across all users in the top k Our top model evaluation metrics
  • 58. 58 Other problems we’re tackling ● Classifiers for listing descriptions ● Deep learning on listing images ● Structured streaming on Spark 2.0 ● Real-time scoring We’re hiring for Data Scientists & Engineers! zillow.com/jobs Questions? shrutik@zillow & nicholass@zillow
  • 59. 59 ● Imri S ● Alex C ● Shruti K ● Nicholas S Recommendations team Data Science ● Hao X ● Purvag P ● Kurtis M ● XiaoXi G ● Ming-Li L ● Jason C ● Andy W ● Eric T ● Andrew M And others! Personalization Product team