Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Rsqrd AI: Zestimates and Zillow AI Platform

Download to read offline

In this talk, Rsqrd AI welcomes Kevin Powell, Director of Zestimates & AI Platform at Zillow! Kevin speaks about the technology and complexity behind the Zestimate and its impact at Zillow.

**These slides are from a talk given at Rsqrd AI. Learn more at rsqrdai.org**

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Rsqrd AI: Zestimates and Zillow AI Platform

  1. 1. The Zestimate System Kevin Powell Director Zestimates & AI Platform Zillow Group
  2. 2. Zillow Mission We’re on a mission to give people the power to unlock life’s next chapter. Zillow serves the full lifecycle of owning and living in a home: buying, selling, renting, financing, remodeling and more. It starts with Zillow's living database of more than 110 million U.S. homes - including homes for sale, homes for rent and homes not currently on the market, as well as Zestimate home values, Rent Zestimates and other home-related information.
  3. 3. Agenda 1. The Zestimate in Zillow 2. The Zestimate in Production 3. ML at Zillow Group 4. Zillow’s AI Platform Process
  4. 4. Zillow Home Details Page (HDP)
  5. 5. Zillow as iBuyer
  6. 6. Zestimate Metrics From https://www.zillow.com/zestimate/
  7. 7. Zestimate in Production
  8. 8. • Languages: R and Python • Data Storage: on-prem RDBMSs • Compute: on-prem hosts • Framework: in-house parallelization library (ZPL) • Staff: Data Analysts and Scientists • Languages: Python and R • Data Storage: AWS (S3), Redis • Compute: AWS EMR, Lambda • Framework: Apache Spark • Staff: Scientists, Machine Learning Engineers and SDE’s Zestimate System • Languages: Python • Data Storage: ZG Data Platform • Compute: k8s • Framework: ZG AI Platform • Staff: Scientists, Machine Learning Engineers and SDE’s
  9. 9. Zestimates Modeling Scale Zestimates ML Pipeline ● Approximately 3600 counties ● 10 models per county ● Train & Score models ● Push to production daily Wiki Commons Source
  10. 10. Zestimates Batch Workflow ● Complex single workflow ● Ensemble models ● Concurrent execution
  11. 11. 3. Real Time Data Processing 2. Batch Data Processing 4. Data Serving 1. Data Ingestion & Storage Zestimate Architecture: The Big Picture
  12. 12. Zestimates as Time Machine Below, we see the evolution of a home over time: • Constructed in 2010 with 2 bedrooms and 1 bath • A full-bath added five years later, increasing the square footage • Finally, another bedroom is added as well as a half-bath
  13. 13. Batch Layer Highlights ETL ● Ingests master data & standardizes across many sources. ● De-dupes, Cleanses and performs sanity checks on data ● Does Feature Extraction ● Create training and scoring sets Train ● This is the layer where our Modelling (Training Models) takes place ● We train models on various geographies making tradeoffs between Data Skew & volume of data. Score ● This is the layer where Batch Scoring of properties takes place. ● The scoring set is partitioned in uniform chunks for parallelization
  14. 14. Speed layer Responding to data Changes quickly • The number one source of Zestimate error is the facts that flow into it – about bedrooms, bathrooms, and square footage. To combat this: • Update Zestimates Quickly - We want to recalculate Zestimates when homes are listed on the market with their facts updated. • To combat data issues, we give homeowners the ability to update such facts and immediately see a change to their Zestimate
  15. 15. ● Kinesis consumer Service - responsible for low-latency transformations to the data and new score calculations. ● Zestimate API - exposing the models to perform real time scoring. ● Redis Cache - we trust the batch output and cache it for real-time use. ○ Does not perform heavy duty cleansing of the data ○ Much of the data cleansing in the batch layer relies on a longitudinal view of the data. Speed Layer Architecture
  16. 16. Serving Layer Architecture • We still rely on our on SQL Server for serving Zestimates on Zillow.com • Reconciliation of views requires knowing when the batch layer started: if a home fact comes in after the batch layer began, we serve the speed layer’s calculation.
  17. 17. Batch Deployment ● 30+ Git repos ● Two-stage build and deploy ● EMR Spark Hybrid
  18. 18. MetricName Regional Aggregations MoMB5 County, National, State MoMB10 County, National, State MoMB20 County, National, State MoMB50 County, National, State MoMC5 County, National, State Sample Metrics MetricName Regional Aggregations EstimateCount County, National, State PublishedZestimates County, National, State ModelPercentile10 County, National, State ModelPercentile25 County, National, State ModelPercentile50 County, National, State Process Metrics: MetricName Regional Aggregations PredVsActual County, National, State MPE County, National, State MAPE County, National, State AAPE County, National, State APE County, National, State Accuracy Metrics: Stability Metrics:
  19. 19. Zillow Prize
  20. 20. ML at Zillow Group
  21. 21. AI at Zillow ZILLOW PREMIER AGENTS PERSONALIZED RECOMMENDATIONSZESTIMATES ZILLOW OFFERSVIRTUAL TOURS CONVERSATIONAL ASSISTANTS
  22. 22. The Platform Process
  23. 23. Why a platform? Modeling Velocity = Business Velocity Common problems modelers face: ● Time on system level issues ● Data access issues ● Lack of experimentation support ● Reproducibility ● Metrics/Logging ● ...
  24. 24. These are not new problems from: Hidden Technical Debt in Machine Learning Systems - 2015 (https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf)
  25. 25. Zestimates Batch Workflow Reprise ● Complex single workflow ● Ensemble models ● Concurrent execution
  26. 26. Platforms explored (Feb. 2020) Zillow Internal Platform
  27. 27. Selection Criteria
  28. 28. Leading Candidate... https://github.com/michalbrys/kubeflow/blob/master/introduction/kubeflow-map.png
  29. 29. Q/A

In this talk, Rsqrd AI welcomes Kevin Powell, Director of Zestimates & AI Platform at Zillow! Kevin speaks about the technology and complexity behind the Zestimate and its impact at Zillow. **These slides are from a talk given at Rsqrd AI. Learn more at rsqrdai.org**

Views

Total views

34

On Slideshare

0

From embeds

0

Number of embeds

18

Actions

Downloads

1

Shares

0

Comments

0

Likes

0

×