SlideShare a Scribd company logo
The Zestimate System
Kevin Powell
Director Zestimates & AI Platform
Zillow Group
Zillow Mission
We’re on a mission to give people the
power to unlock life’s next chapter.
Zillow serves the full lifecycle of owning and living in a home: buying, selling,
renting, financing, remodeling and more. It starts with Zillow's living database of
more than 110 million U.S. homes - including homes for sale, homes for rent and
homes not currently on the market, as well as Zestimate home values, Rent
Zestimates and other home-related information.
Agenda
1. The Zestimate in Zillow
2. The Zestimate in Production
3. ML at Zillow Group
4. Zillow’s AI Platform Process
Zillow Home Details Page (HDP)
Zillow as iBuyer
Zestimate Metrics
From https://www.zillow.com/zestimate/
Zestimate in Production
• Languages: R and Python
• Data Storage: on-prem RDBMSs
• Compute: on-prem hosts
• Framework: in-house
parallelization library (ZPL)
• Staff: Data Analysts and
Scientists
• Languages: Python and R
• Data Storage: AWS (S3), Redis
• Compute: AWS EMR, Lambda
• Framework: Apache Spark
• Staff: Scientists, Machine
Learning Engineers and SDE’s
Zestimate System
• Languages: Python
• Data Storage: ZG Data Platform
• Compute: k8s
• Framework: ZG AI Platform
• Staff: Scientists, Machine
Learning Engineers and SDE’s
Zestimates Modeling Scale
Zestimates ML Pipeline
● Approximately 3600
counties
● 10 models per county
● Train & Score models
● Push to production daily
Wiki Commons Source
Zestimates Batch Workflow
● Complex single workflow
● Ensemble models
● Concurrent execution
3. Real
Time Data
Processing
2. Batch Data
Processing
4. Data
Serving
1. Data
Ingestion &
Storage
Zestimate Architecture: The Big Picture
Zestimates as Time Machine
Below, we see the evolution of a home over time:
• Constructed in 2010 with 2 bedrooms and 1 bath
• A full-bath added five years later, increasing the square
footage
• Finally, another bedroom is added as well as a half-bath
Batch Layer Highlights
ETL
● Ingests master data & standardizes across many sources.
● De-dupes, Cleanses and performs sanity checks on data
● Does Feature Extraction
● Create training and scoring sets
Train
● This is the layer where our Modelling (Training Models) takes place
● We train models on various geographies making tradeoffs between
Data Skew & volume of data.
Score
● This is the layer where Batch Scoring of properties takes place.
● The scoring set is partitioned in uniform chunks for parallelization
Speed layer
Responding to data Changes quickly
• The number one source of Zestimate error is the facts that flow into it – about
bedrooms, bathrooms, and square footage.
To combat this:
• Update Zestimates Quickly - We want to recalculate Zestimates when homes are
listed on the market with their facts updated.
• To combat data issues, we give homeowners the ability to update such facts and
immediately see a change to their Zestimate
● Kinesis consumer Service -
responsible for low-latency
transformations to the data and new
score calculations.
● Zestimate API - exposing the models
to perform real time scoring.
● Redis Cache - we trust the batch output
and cache it for real-time use.
○ Does not perform heavy duty cleansing
of the data
○ Much of the data cleansing in the batch
layer relies on a longitudinal view of the
data.
Speed Layer Architecture
Serving Layer Architecture
• We still rely on our on SQL
Server for serving Zestimates on
Zillow.com
• Reconciliation of views requires
knowing when the batch layer
started: if a home fact comes in
after the batch layer began, we
serve the speed layer’s
calculation.
Batch Deployment
● 30+ Git repos
● Two-stage build and deploy
● EMR Spark Hybrid
MetricName Regional Aggregations
MoMB5 County, National, State
MoMB10 County, National, State
MoMB20 County, National, State
MoMB50 County, National, State
MoMC5 County, National, State
Sample Metrics
MetricName Regional Aggregations
EstimateCount County, National, State
PublishedZestimates County, National, State
ModelPercentile10 County, National, State
ModelPercentile25 County, National, State
ModelPercentile50 County, National, State
Process Metrics:
MetricName Regional Aggregations
PredVsActual County, National, State
MPE County, National, State
MAPE County, National, State
AAPE County, National, State
APE County, National, State
Accuracy Metrics: Stability Metrics:
Zillow Prize
ML at Zillow Group
AI at Zillow
ZILLOW PREMIER
AGENTS
PERSONALIZED
RECOMMENDATIONSZESTIMATES
ZILLOW OFFERSVIRTUAL TOURS CONVERSATIONAL
ASSISTANTS
The Platform Process
Why a platform?
Modeling Velocity = Business Velocity
Common problems modelers face:
● Time on system level issues
● Data access issues
● Lack of experimentation support
● Reproducibility
● Metrics/Logging
● ...
These are not new problems
from: Hidden Technical Debt in Machine Learning Systems - 2015
(https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf)
Zestimates Batch Workflow Reprise
● Complex single workflow
● Ensemble models
● Concurrent execution
Platforms explored (Feb. 2020)
Zillow Internal Platform
Selection Criteria
Leading Candidate...
https://github.com/michalbrys/kubeflow/blob/master/introduction/kubeflow-map.png
Q/A

More Related Content

Similar to Rsqrd AI: Zestimates and Zillow AI Platform

Why retail companies can't afford database downtime
Why retail companies can't afford database downtimeWhy retail companies can't afford database downtime
Why retail companies can't afford database downtime
DBmaestro - Database DevOps
 
The challenges of live events scalability
The challenges of live events scalabilityThe challenges of live events scalability
The challenges of live events scalability
Guy Tomer
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
Grega Kespret
 
Building Scalable Aggregation Systems
Building Scalable Aggregation SystemsBuilding Scalable Aggregation Systems
Building Scalable Aggregation Systems
Jared Winick
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big Problems
TechWell
 
Gemfire Introduction
Gemfire Introduction Gemfire Introduction
Gemfire Introduction
VMware Tanzu Korea
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Gurpreet Singh Sachdeva
 
Intro to Report Developer Role
Intro to Report Developer RoleIntro to Report Developer Role
Intro to Report Developer Role
Jonathan Bloom
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
Nicholas McClure
 
UNIT3 DBMS.pptx operation nd management of data base
UNIT3 DBMS.pptx operation nd management of data baseUNIT3 DBMS.pptx operation nd management of data base
UNIT3 DBMS.pptx operation nd management of data base
shindhe1098cv
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
SpringPeople
 
(Tugdual grall) no sql-hadoop
(Tugdual grall)   no sql-hadoop(Tugdual grall)   no sql-hadoop
(Tugdual grall) no sql-hadoop
NAVER D2
 
Subrat K Panigrahi Resume
Subrat K Panigrahi ResumeSubrat K Panigrahi Resume
Subrat K Panigrahi Resume
Subrat Kumar Panigrahi
 
Cognos Dynamic Cubes:Set To Retire Transformer?: 10.2.2 Update: Pros & Cons
Cognos Dynamic Cubes:Set To Retire Transformer?: 10.2.2 Update: Pros & ConsCognos Dynamic Cubes:Set To Retire Transformer?: 10.2.2 Update: Pros & Cons
Cognos Dynamic Cubes:Set To Retire Transformer?: 10.2.2 Update: Pros & Cons
Senturus
 
SP1740_Vivek Kumar_Speridian
SP1740_Vivek Kumar_SperidianSP1740_Vivek Kumar_Speridian
SP1740_Vivek Kumar_Speridian
vivek kumar
 
Orchestration, the conductor's score
Orchestration, the conductor's scoreOrchestration, the conductor's score
Orchestration, the conductor's score
Salesforce Engineering
 
The final frontier
The final frontierThe final frontier
The final frontier
Terry Bunio
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
Sense Corp
 
K8scale update-kubecon2015
K8scale update-kubecon2015K8scale update-kubecon2015
K8scale update-kubecon2015
Bob Wise
 
Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)
KubeAcademy
 

Similar to Rsqrd AI: Zestimates and Zillow AI Platform (20)

Why retail companies can't afford database downtime
Why retail companies can't afford database downtimeWhy retail companies can't afford database downtime
Why retail companies can't afford database downtime
 
The challenges of live events scalability
The challenges of live events scalabilityThe challenges of live events scalability
The challenges of live events scalability
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Building Scalable Aggregation Systems
Building Scalable Aggregation SystemsBuilding Scalable Aggregation Systems
Building Scalable Aggregation Systems
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big Problems
 
Gemfire Introduction
Gemfire Introduction Gemfire Introduction
Gemfire Introduction
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Intro to Report Developer Role
Intro to Report Developer RoleIntro to Report Developer Role
Intro to Report Developer Role
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
UNIT3 DBMS.pptx operation nd management of data base
UNIT3 DBMS.pptx operation nd management of data baseUNIT3 DBMS.pptx operation nd management of data base
UNIT3 DBMS.pptx operation nd management of data base
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
(Tugdual grall) no sql-hadoop
(Tugdual grall)   no sql-hadoop(Tugdual grall)   no sql-hadoop
(Tugdual grall) no sql-hadoop
 
Subrat K Panigrahi Resume
Subrat K Panigrahi ResumeSubrat K Panigrahi Resume
Subrat K Panigrahi Resume
 
Cognos Dynamic Cubes:Set To Retire Transformer?: 10.2.2 Update: Pros & Cons
Cognos Dynamic Cubes:Set To Retire Transformer?: 10.2.2 Update: Pros & ConsCognos Dynamic Cubes:Set To Retire Transformer?: 10.2.2 Update: Pros & Cons
Cognos Dynamic Cubes:Set To Retire Transformer?: 10.2.2 Update: Pros & Cons
 
SP1740_Vivek Kumar_Speridian
SP1740_Vivek Kumar_SperidianSP1740_Vivek Kumar_Speridian
SP1740_Vivek Kumar_Speridian
 
Orchestration, the conductor's score
Orchestration, the conductor's scoreOrchestration, the conductor's score
Orchestration, the conductor's score
 
The final frontier
The final frontierThe final frontier
The final frontier
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
 
K8scale update-kubecon2015
K8scale update-kubecon2015K8scale update-kubecon2015
K8scale update-kubecon2015
 
Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)
 

More from Sanjana Chowdhury

Rsqrd AI: Making Conversational AI Work for Everybody
Rsqrd AI: Making Conversational AI Work for EverybodyRsqrd AI: Making Conversational AI Work for Everybody
Rsqrd AI: Making Conversational AI Work for Everybody
Sanjana Chowdhury
 
Rsqrd AI: Application of Explanation Model in Healthcare
Rsqrd AI: Application of Explanation Model in HealthcareRsqrd AI: Application of Explanation Model in Healthcare
Rsqrd AI: Application of Explanation Model in Healthcare
Sanjana Chowdhury
 
Rsqrd AI: Recent Advances in Explainable Machine Learning Research
Rsqrd AI: Recent Advances in Explainable Machine Learning ResearchRsqrd AI: Recent Advances in Explainable Machine Learning Research
Rsqrd AI: Recent Advances in Explainable Machine Learning Research
Sanjana Chowdhury
 
Rsqrd AI: Incorporating Priors with Feature Attribution on Text Classification
Rsqrd AI: Incorporating Priors with Feature Attribution on Text ClassificationRsqrd AI: Incorporating Priors with Feature Attribution on Text Classification
Rsqrd AI: Incorporating Priors with Feature Attribution on Text Classification
Sanjana Chowdhury
 
Rsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations
Rsqrd AI: Discovering Natural Bugs Using Adversarial PerturbationsRsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations
Rsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations
Sanjana Chowdhury
 
Rsqrd AI: A Survey of The Current Ecosystem of Explainability Techniques
Rsqrd AI: A Survey of The Current Ecosystem of Explainability TechniquesRsqrd AI: A Survey of The Current Ecosystem of Explainability Techniques
Rsqrd AI: A Survey of The Current Ecosystem of Explainability Techniques
Sanjana Chowdhury
 
Rsqrd AI: Explaining ML Models w/ Geometric Intuition
Rsqrd AI: Explaining ML Models w/ Geometric IntuitionRsqrd AI: Explaining ML Models w/ Geometric Intuition
Rsqrd AI: Explaining ML Models w/ Geometric Intuition
Sanjana Chowdhury
 
Rsqrd AI: Errudite- Scalable, Reproducible, and Testable Error Analysis
Rsqrd AI: Errudite- Scalable, Reproducible, and Testable Error AnalysisRsqrd AI: Errudite- Scalable, Reproducible, and Testable Error Analysis
Rsqrd AI: Errudite- Scalable, Reproducible, and Testable Error Analysis
Sanjana Chowdhury
 
Rsqrd AI: Exploring Machine Learning Model Predictions
Rsqrd AI: Exploring Machine Learning Model PredictionsRsqrd AI: Exploring Machine Learning Model Predictions
Rsqrd AI: Exploring Machine Learning Model Predictions
Sanjana Chowdhury
 
Rsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first StartupRsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first Startup
Sanjana Chowdhury
 
Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AI
Sanjana Chowdhury
 
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible PipelineRsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Sanjana Chowdhury
 

More from Sanjana Chowdhury (12)

Rsqrd AI: Making Conversational AI Work for Everybody
Rsqrd AI: Making Conversational AI Work for EverybodyRsqrd AI: Making Conversational AI Work for Everybody
Rsqrd AI: Making Conversational AI Work for Everybody
 
Rsqrd AI: Application of Explanation Model in Healthcare
Rsqrd AI: Application of Explanation Model in HealthcareRsqrd AI: Application of Explanation Model in Healthcare
Rsqrd AI: Application of Explanation Model in Healthcare
 
Rsqrd AI: Recent Advances in Explainable Machine Learning Research
Rsqrd AI: Recent Advances in Explainable Machine Learning ResearchRsqrd AI: Recent Advances in Explainable Machine Learning Research
Rsqrd AI: Recent Advances in Explainable Machine Learning Research
 
Rsqrd AI: Incorporating Priors with Feature Attribution on Text Classification
Rsqrd AI: Incorporating Priors with Feature Attribution on Text ClassificationRsqrd AI: Incorporating Priors with Feature Attribution on Text Classification
Rsqrd AI: Incorporating Priors with Feature Attribution on Text Classification
 
Rsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations
Rsqrd AI: Discovering Natural Bugs Using Adversarial PerturbationsRsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations
Rsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations
 
Rsqrd AI: A Survey of The Current Ecosystem of Explainability Techniques
Rsqrd AI: A Survey of The Current Ecosystem of Explainability TechniquesRsqrd AI: A Survey of The Current Ecosystem of Explainability Techniques
Rsqrd AI: A Survey of The Current Ecosystem of Explainability Techniques
 
Rsqrd AI: Explaining ML Models w/ Geometric Intuition
Rsqrd AI: Explaining ML Models w/ Geometric IntuitionRsqrd AI: Explaining ML Models w/ Geometric Intuition
Rsqrd AI: Explaining ML Models w/ Geometric Intuition
 
Rsqrd AI: Errudite- Scalable, Reproducible, and Testable Error Analysis
Rsqrd AI: Errudite- Scalable, Reproducible, and Testable Error AnalysisRsqrd AI: Errudite- Scalable, Reproducible, and Testable Error Analysis
Rsqrd AI: Errudite- Scalable, Reproducible, and Testable Error Analysis
 
Rsqrd AI: Exploring Machine Learning Model Predictions
Rsqrd AI: Exploring Machine Learning Model PredictionsRsqrd AI: Exploring Machine Learning Model Predictions
Rsqrd AI: Exploring Machine Learning Model Predictions
 
Rsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first StartupRsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first Startup
 
Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AI
 
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible PipelineRsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
 

Recently uploaded

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 

Recently uploaded (20)

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 

Rsqrd AI: Zestimates and Zillow AI Platform

  • 1. The Zestimate System Kevin Powell Director Zestimates & AI Platform Zillow Group
  • 2. Zillow Mission We’re on a mission to give people the power to unlock life’s next chapter. Zillow serves the full lifecycle of owning and living in a home: buying, selling, renting, financing, remodeling and more. It starts with Zillow's living database of more than 110 million U.S. homes - including homes for sale, homes for rent and homes not currently on the market, as well as Zestimate home values, Rent Zestimates and other home-related information.
  • 3. Agenda 1. The Zestimate in Zillow 2. The Zestimate in Production 3. ML at Zillow Group 4. Zillow’s AI Platform Process
  • 4. Zillow Home Details Page (HDP)
  • 8. • Languages: R and Python • Data Storage: on-prem RDBMSs • Compute: on-prem hosts • Framework: in-house parallelization library (ZPL) • Staff: Data Analysts and Scientists • Languages: Python and R • Data Storage: AWS (S3), Redis • Compute: AWS EMR, Lambda • Framework: Apache Spark • Staff: Scientists, Machine Learning Engineers and SDE’s Zestimate System • Languages: Python • Data Storage: ZG Data Platform • Compute: k8s • Framework: ZG AI Platform • Staff: Scientists, Machine Learning Engineers and SDE’s
  • 9. Zestimates Modeling Scale Zestimates ML Pipeline ● Approximately 3600 counties ● 10 models per county ● Train & Score models ● Push to production daily Wiki Commons Source
  • 10. Zestimates Batch Workflow ● Complex single workflow ● Ensemble models ● Concurrent execution
  • 11. 3. Real Time Data Processing 2. Batch Data Processing 4. Data Serving 1. Data Ingestion & Storage Zestimate Architecture: The Big Picture
  • 12. Zestimates as Time Machine Below, we see the evolution of a home over time: • Constructed in 2010 with 2 bedrooms and 1 bath • A full-bath added five years later, increasing the square footage • Finally, another bedroom is added as well as a half-bath
  • 13. Batch Layer Highlights ETL ● Ingests master data & standardizes across many sources. ● De-dupes, Cleanses and performs sanity checks on data ● Does Feature Extraction ● Create training and scoring sets Train ● This is the layer where our Modelling (Training Models) takes place ● We train models on various geographies making tradeoffs between Data Skew & volume of data. Score ● This is the layer where Batch Scoring of properties takes place. ● The scoring set is partitioned in uniform chunks for parallelization
  • 14. Speed layer Responding to data Changes quickly • The number one source of Zestimate error is the facts that flow into it – about bedrooms, bathrooms, and square footage. To combat this: • Update Zestimates Quickly - We want to recalculate Zestimates when homes are listed on the market with their facts updated. • To combat data issues, we give homeowners the ability to update such facts and immediately see a change to their Zestimate
  • 15. ● Kinesis consumer Service - responsible for low-latency transformations to the data and new score calculations. ● Zestimate API - exposing the models to perform real time scoring. ● Redis Cache - we trust the batch output and cache it for real-time use. ○ Does not perform heavy duty cleansing of the data ○ Much of the data cleansing in the batch layer relies on a longitudinal view of the data. Speed Layer Architecture
  • 16. Serving Layer Architecture • We still rely on our on SQL Server for serving Zestimates on Zillow.com • Reconciliation of views requires knowing when the batch layer started: if a home fact comes in after the batch layer began, we serve the speed layer’s calculation.
  • 17. Batch Deployment ● 30+ Git repos ● Two-stage build and deploy ● EMR Spark Hybrid
  • 18. MetricName Regional Aggregations MoMB5 County, National, State MoMB10 County, National, State MoMB20 County, National, State MoMB50 County, National, State MoMC5 County, National, State Sample Metrics MetricName Regional Aggregations EstimateCount County, National, State PublishedZestimates County, National, State ModelPercentile10 County, National, State ModelPercentile25 County, National, State ModelPercentile50 County, National, State Process Metrics: MetricName Regional Aggregations PredVsActual County, National, State MPE County, National, State MAPE County, National, State AAPE County, National, State APE County, National, State Accuracy Metrics: Stability Metrics:
  • 20. ML at Zillow Group
  • 21. AI at Zillow ZILLOW PREMIER AGENTS PERSONALIZED RECOMMENDATIONSZESTIMATES ZILLOW OFFERSVIRTUAL TOURS CONVERSATIONAL ASSISTANTS
  • 23. Why a platform? Modeling Velocity = Business Velocity Common problems modelers face: ● Time on system level issues ● Data access issues ● Lack of experimentation support ● Reproducibility ● Metrics/Logging ● ...
  • 24. These are not new problems from: Hidden Technical Debt in Machine Learning Systems - 2015 (https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf)
  • 25. Zestimates Batch Workflow Reprise ● Complex single workflow ● Ensemble models ● Concurrent execution
  • 26. Platforms explored (Feb. 2020) Zillow Internal Platform
  • 29. Q/A