SlideShare a Scribd company logo
1 of 34
11
DATA SCIENCE AT ZILLOW
Imri Sofer, Senior Data Scientist
2
Zillow Group’s mission
TO BUILD THE WORLD’S LARGEST, MOST TRUSTED AND
VIBRANT HOME-RELATED MARKETPLACE
3
Zillow’s marketplace
PROPERTY
MANAGERS
& LANDLORDS
BUYERS &
SELLERS
RENTERS
HOMEOWNERS
REAL ESTATE AGENTS
MORTGAGE PROVIDERS
4
For buyers
5
For buyers
6
For renters
7
Zillow Group two step business model:
1. Make amazing products to
attract users
2. Professionals pay to show case
themselves.
8
For buyers
9
The largest portfolio of real estate brands
CONSUMER BRANDS
BUSINESS BRANDS
10
Zillow Group’s audience continues to grow
MONTHLY UNIQUE USERS
Quarterly average (Millions)
0
20
40
60
80
100
120
140
160
180
Seasonal peak of
171M
Unique visitors in May 2016
1111
Why is data science
important to Zillow?
Because Zillow is data
12
Zillow is data
- Our product is driven by data
- The largest most comprehensive housing data (Breadth and depth).
- Over 65 million have been updated by users.
- Our product generates data
- 2MM Reviews of agents.
- More than 300,000 lender reviews.
- 1TB of user activity every day.
- Data is our product
- Users come to Zillow because they trust our housing data.
- Users want to find a trusted agent, and lender that provide great rates and
services.
- We provide data for free for academic/institutional researchers.
- Zillow.com/data – free consumer data (Zillow home value index is available at
a monthly frequency for the nation through states, to neighborhoods.)
13
Data Science and Engineering at Zillow
Clam Bake Beach Day, Aug 2016, at Golden Gardens Park in Seattle, WA
14
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
B2B
• Ad Campaigns
• Agent segmentation
• Search Engine Marketing (SEM)
Computer Vision
• Videos
• Photos
User Profiles
• Persona Predictions
• Journey location prediction
• Lender Recommendations
Recommendations
• Home recommendation
• Similar homes
• New regions to explore
• Explain recommendations
15
Machine Learning at Zillow
• Example page
Home Valuation
• Zestimate
• Zestimate Forecast
• Rent Zestimate
• Pricing Tool
• Best Time to List
• Zillow Home Value Index
• Zillow Rent Index
example page
16
Zestimate
Goals:
• High Accuracy
• Low Bias
• Independent
• Stable over time.
• Robust to outliers.
• High coverage (Over 100
million homes currently)
• Able to respond to user fact
changes
17
Challenges with the Zestimate
• Some listings are missing features: How do we deal with missing data?
• Some listings have corrupted features (e.g. 28 bathrooms): How do we
identify those?
• Some sale prices do not reflect the value of the home(e.g. a parent
sales to his child): how do we deal with outliers?
• Feature engineering: How can we translate previous sales to
meaningful features?
• How do we identify the places where the model needs to be improve?
18
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
Computer Vision
• Videos
• Photos
19
Computer Vision at Zillow
• Images and videos play a big role in helping people buy/rent
homes
• Recent deep-learning advancements for CV
20
Let Zillow See
• As of now, our Zestimates are mainly based on
location and size of the properties and they do not
consider the quality.
• Tax assessment might carry house quality
information up to some extent but that’s not
enough.
• For example, an interior upgrade would not change the
tax assessment in most cases if not all
21
• We train a deep convolutional neural network (CNN) to estimate
quality.
Deep Convolutional Neural
Network
Zestimate
22
Image quality scores (prediction)
[0-3] [3-7] [7-10]
23
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
Computer Vision
• Videos
• Photos
Recommendations
• Home recommendation
• Similar homes
• New regions to explore
• Explain recommendations
24
Recommending movies
25
Home Recommendations
• Our goal is to show users the homes that are relevant to them.
Email
When viewing a home
Ranking search results
26
Email Recommendation
• Goal: Take past user activity and generate relevant recommendations
for new and existing listings.
• Challenges:
• How do we transform user activity into a vector of features?
• What do we want to optimize for? Clicks? Dwell time? Saves?
• What should we do when users don’t have a browsing history (cold start)?
• How can we scale the model to rank 2.5MM homes for 50M buyers? Most
recommendation algorithms are not built for this problem (Netlifx has 5000
movies in its catalog)
27
• user_id listing_id like
• 12 5 1
• 12 34 0
• 12 567 1
• 144 5 0
• 144 34 0
• 1550 567 1
28
Traditional User-Item matrix
Users
Traditional
Items
29
Zillow’s User-Item matrix
Users
Zillow
Items
30
How can we generate meaningful features?
• Date user_id listing_id f1 f2 ... f50 like
• 2017-01-02 12 5 0.89 0.3 0.6 0
• 2017-01-09 12 34 0.90 0.1 0.1 0
• 2017-01-29 12 567 0.82 0.8 0.1 1
• 2017-01-02 144 5 0.19 0.9 0.9 0
• 2017-02-20 144 34 0.40 0.3 0.8 0
• 2017-02-03 1550 567 0.99 0.9 0.8 1
31
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
B2B
• Ad Campaigns
• Agent segmentation
• Search Engine Marketing (SEM)
Computer Vision
• Videos
• Photos
User Profiles
• Persona Predictions
• Journey location prediction
• Lender Recommendations
Recommendations
• Home recommendation
• Similar homes
• New regions to explore
• Explain recommendations
32
Tools
• Spark (Scala and Python)
• R
• Python (numpy, scipy, sklearn, pandas)
• Random forest
• Linear, logistic, quantile regressions.
• Deep neural nets.
• Matrix Factorization
• Etc.
• AWS
33
Zillow Core Values
• Own it.
• Turn on the Lights.
• ZG is a Team Sport.
• Move Fast. Think Big.
• Winning is Fun.
• Act With Integrity
3434
We’re hiring!
• Data Scientist, Computer Vision and Deep learning
• Software Engineer, Machine Learning
• Data Scientist, Machine Learning
• Internship opportunities across Analytics
- Glassdoor reviews: Top 10 in Seattle Business Magazine
100 Best Companies (#3)
- Glassdoor’s Employees’ Choice Best Places to Work;
Glassdoor’s Best Benefits and Perks;
www.zillow.com/jobs
www.zillow.com/data-science

More Related Content

What's hot

Machine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— ClassificationMachine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— ClassificationBrian Lange
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerAmazon Web Services
 
Predictive analysis and modelling
Predictive analysis and modellingPredictive analysis and modelling
Predictive analysis and modellinglalit Lalitm7225
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
Product management class rookie to pro
Product management class rookie to proProduct management class rookie to pro
Product management class rookie to proBim Akinfenwa
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
kinds of analytics
kinds of analyticskinds of analytics
kinds of analyticsBenila Paul
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep LearningMyungjin Lee
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly DetectionKenneth Graham
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data ScientistAlexey Grigorev
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductoryAbhimanyu Dwivedi
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Fast AI with Image Classification.pptx
Fast AI with Image Classification.pptxFast AI with Image Classification.pptx
Fast AI with Image Classification.pptxAbraham Kong
 

What's hot (20)

Machine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— ClassificationMachine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— Classification
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Predictive analysis and modelling
Predictive analysis and modellingPredictive analysis and modelling
Predictive analysis and modelling
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Product management class rookie to pro
Product management class rookie to proProduct management class rookie to pro
Product management class rookie to pro
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
kinds of analytics
kinds of analyticskinds of analytics
kinds of analytics
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep Learning
 
Confusion Matrix Explained
Confusion Matrix ExplainedConfusion Matrix Explained
Confusion Matrix Explained
 
Semantic web an overview and projects
Semantic web   an  overview and projectsSemantic web   an  overview and projects
Semantic web an overview and projects
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
8017 25 image mining
8017 25 image mining8017 25 image mining
8017 25 image mining
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly Detection
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data Scientist
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductory
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Fast AI with Image Classification.pptx
Fast AI with Image Classification.pptxFast AI with Image Classification.pptx
Fast AI with Image Classification.pptx
 

Similar to Overview of Data Science at Zillow

Neighborhood Match Pitch
Neighborhood Match PitchNeighborhood Match Pitch
Neighborhood Match PitchKostub Deshmukh
 
Roommatefax Inc. Pitch Deck
Roommatefax Inc. Pitch DeckRoommatefax Inc. Pitch Deck
Roommatefax Inc. Pitch DeckSteve Wolf
 
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012Zane Salim
 
Listing Presentation St. Charles IL(1)
Listing Presentation St. Charles IL(1)Listing Presentation St. Charles IL(1)
Listing Presentation St. Charles IL(1)Leslie Ebersole
 
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion RevenueZillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion RevenueOptimizely
 
Peppy walls business model
Peppy walls   business modelPeppy walls   business model
Peppy walls business modelpavithran ayyala
 
Internet Marketing, EO Accelerator Presentation
Internet Marketing, EO Accelerator PresentationInternet Marketing, EO Accelerator Presentation
Internet Marketing, EO Accelerator PresentationTop Draw Inc.
 
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMS
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMSGAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMS
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMSGameZBoost
 
Today's Renter
Today's RenterToday's Renter
Today's RenterOn-Site
 
2016 Technology in the Vacation Rental Industry
2016 Technology in the Vacation Rental Industry2016 Technology in the Vacation Rental Industry
2016 Technology in the Vacation Rental IndustryAmy Hinote
 
6 Top Real Estate Managed Analytics Service Providers.pptx
6 Top Real Estate Managed Analytics Service Providers.pptx6 Top Real Estate Managed Analytics Service Providers.pptx
6 Top Real Estate Managed Analytics Service Providers.pptxKavika Roy
 
Reocon social media power of lead generation_1-29-2012
Reocon social media power of lead generation_1-29-2012Reocon social media power of lead generation_1-29-2012
Reocon social media power of lead generation_1-29-2012Ken Blevins
 
TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldVoltDB
 
Tipping Point for CRE Tech - Brandon Weber, VTS
Tipping Point for CRE Tech - Brandon Weber, VTSTipping Point for CRE Tech - Brandon Weber, VTS
Tipping Point for CRE Tech - Brandon Weber, VTSRyan Slack
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBigML, Inc
 
Atlas Arkansas What Would Google Do?
Atlas Arkansas What Would Google Do?Atlas Arkansas What Would Google Do?
Atlas Arkansas What Would Google Do?Atlas Integrated
 
ZingClick- Innovating solutions
ZingClick- Innovating solutionsZingClick- Innovating solutions
ZingClick- Innovating solutionsZing Click
 
Bali imedia april 2013 draft 2
Bali imedia april 2013 draft 2Bali imedia april 2013 draft 2
Bali imedia april 2013 draft 2Kanika
 

Similar to Overview of Data Science at Zillow (20)

Neighborhood Match Pitch
Neighborhood Match PitchNeighborhood Match Pitch
Neighborhood Match Pitch
 
QH_SalesPitch (2).pdf
QH_SalesPitch (2).pdfQH_SalesPitch (2).pdf
QH_SalesPitch (2).pdf
 
Roommatefax Inc. Pitch Deck
Roommatefax Inc. Pitch DeckRoommatefax Inc. Pitch Deck
Roommatefax Inc. Pitch Deck
 
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012
 
Usalytics.pitch.v3.1
Usalytics.pitch.v3.1Usalytics.pitch.v3.1
Usalytics.pitch.v3.1
 
Listing Presentation St. Charles IL(1)
Listing Presentation St. Charles IL(1)Listing Presentation St. Charles IL(1)
Listing Presentation St. Charles IL(1)
 
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion RevenueZillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
 
Peppy walls business model
Peppy walls   business modelPeppy walls   business model
Peppy walls business model
 
Internet Marketing, EO Accelerator Presentation
Internet Marketing, EO Accelerator PresentationInternet Marketing, EO Accelerator Presentation
Internet Marketing, EO Accelerator Presentation
 
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMS
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMSGAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMS
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMS
 
Today's Renter
Today's RenterToday's Renter
Today's Renter
 
2016 Technology in the Vacation Rental Industry
2016 Technology in the Vacation Rental Industry2016 Technology in the Vacation Rental Industry
2016 Technology in the Vacation Rental Industry
 
6 Top Real Estate Managed Analytics Service Providers.pptx
6 Top Real Estate Managed Analytics Service Providers.pptx6 Top Real Estate Managed Analytics Service Providers.pptx
6 Top Real Estate Managed Analytics Service Providers.pptx
 
Reocon social media power of lead generation_1-29-2012
Reocon social media power of lead generation_1-29-2012Reocon social media power of lead generation_1-29-2012
Reocon social media power of lead generation_1-29-2012
 
TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech World
 
Tipping Point for CRE Tech - Brandon Weber, VTS
Tipping Point for CRE Tech - Brandon Weber, VTSTipping Point for CRE Tech - Brandon Weber, VTS
Tipping Point for CRE Tech - Brandon Weber, VTS
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
 
Atlas Arkansas What Would Google Do?
Atlas Arkansas What Would Google Do?Atlas Arkansas What Would Google Do?
Atlas Arkansas What Would Google Do?
 
ZingClick- Innovating solutions
ZingClick- Innovating solutionsZingClick- Innovating solutions
ZingClick- Innovating solutions
 
Bali imedia april 2013 draft 2
Bali imedia april 2013 draft 2Bali imedia april 2013 draft 2
Bali imedia april 2013 draft 2
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Overview of Data Science at Zillow

  • 1. 11 DATA SCIENCE AT ZILLOW Imri Sofer, Senior Data Scientist
  • 2. 2 Zillow Group’s mission TO BUILD THE WORLD’S LARGEST, MOST TRUSTED AND VIBRANT HOME-RELATED MARKETPLACE
  • 3. 3 Zillow’s marketplace PROPERTY MANAGERS & LANDLORDS BUYERS & SELLERS RENTERS HOMEOWNERS REAL ESTATE AGENTS MORTGAGE PROVIDERS
  • 7. 7 Zillow Group two step business model: 1. Make amazing products to attract users 2. Professionals pay to show case themselves.
  • 9. 9 The largest portfolio of real estate brands CONSUMER BRANDS BUSINESS BRANDS
  • 10. 10 Zillow Group’s audience continues to grow MONTHLY UNIQUE USERS Quarterly average (Millions) 0 20 40 60 80 100 120 140 160 180 Seasonal peak of 171M Unique visitors in May 2016
  • 11. 1111 Why is data science important to Zillow? Because Zillow is data
  • 12. 12 Zillow is data - Our product is driven by data - The largest most comprehensive housing data (Breadth and depth). - Over 65 million have been updated by users. - Our product generates data - 2MM Reviews of agents. - More than 300,000 lender reviews. - 1TB of user activity every day. - Data is our product - Users come to Zillow because they trust our housing data. - Users want to find a trusted agent, and lender that provide great rates and services. - We provide data for free for academic/institutional researchers. - Zillow.com/data – free consumer data (Zillow home value index is available at a monthly frequency for the nation through states, to neighborhoods.)
  • 13. 13 Data Science and Engineering at Zillow Clam Bake Beach Day, Aug 2016, at Golden Gardens Park in Seattle, WA
  • 14. 14 Machine Learning at Zillow Home Valuation • Zestimate • Zestimate Forecast • Zillow Home Value Index • Rent Zestimate • Zillow Rent Index • Pricing Tool • Best Time to List B2B • Ad Campaigns • Agent segmentation • Search Engine Marketing (SEM) Computer Vision • Videos • Photos User Profiles • Persona Predictions • Journey location prediction • Lender Recommendations Recommendations • Home recommendation • Similar homes • New regions to explore • Explain recommendations
  • 15. 15 Machine Learning at Zillow • Example page Home Valuation • Zestimate • Zestimate Forecast • Rent Zestimate • Pricing Tool • Best Time to List • Zillow Home Value Index • Zillow Rent Index example page
  • 16. 16 Zestimate Goals: • High Accuracy • Low Bias • Independent • Stable over time. • Robust to outliers. • High coverage (Over 100 million homes currently) • Able to respond to user fact changes
  • 17. 17 Challenges with the Zestimate • Some listings are missing features: How do we deal with missing data? • Some listings have corrupted features (e.g. 28 bathrooms): How do we identify those? • Some sale prices do not reflect the value of the home(e.g. a parent sales to his child): how do we deal with outliers? • Feature engineering: How can we translate previous sales to meaningful features? • How do we identify the places where the model needs to be improve?
  • 18. 18 Machine Learning at Zillow Home Valuation • Zestimate • Zestimate Forecast • Zillow Home Value Index • Rent Zestimate • Zillow Rent Index • Pricing Tool • Best Time to List Computer Vision • Videos • Photos
  • 19. 19 Computer Vision at Zillow • Images and videos play a big role in helping people buy/rent homes • Recent deep-learning advancements for CV
  • 20. 20 Let Zillow See • As of now, our Zestimates are mainly based on location and size of the properties and they do not consider the quality. • Tax assessment might carry house quality information up to some extent but that’s not enough. • For example, an interior upgrade would not change the tax assessment in most cases if not all
  • 21. 21 • We train a deep convolutional neural network (CNN) to estimate quality. Deep Convolutional Neural Network Zestimate
  • 22. 22 Image quality scores (prediction) [0-3] [3-7] [7-10]
  • 23. 23 Machine Learning at Zillow Home Valuation • Zestimate • Zestimate Forecast • Zillow Home Value Index • Rent Zestimate • Zillow Rent Index • Pricing Tool • Best Time to List Computer Vision • Videos • Photos Recommendations • Home recommendation • Similar homes • New regions to explore • Explain recommendations
  • 25. 25 Home Recommendations • Our goal is to show users the homes that are relevant to them. Email When viewing a home Ranking search results
  • 26. 26 Email Recommendation • Goal: Take past user activity and generate relevant recommendations for new and existing listings. • Challenges: • How do we transform user activity into a vector of features? • What do we want to optimize for? Clicks? Dwell time? Saves? • What should we do when users don’t have a browsing history (cold start)? • How can we scale the model to rank 2.5MM homes for 50M buyers? Most recommendation algorithms are not built for this problem (Netlifx has 5000 movies in its catalog)
  • 27. 27 • user_id listing_id like • 12 5 1 • 12 34 0 • 12 567 1 • 144 5 0 • 144 34 0 • 1550 567 1
  • 30. 30 How can we generate meaningful features? • Date user_id listing_id f1 f2 ... f50 like • 2017-01-02 12 5 0.89 0.3 0.6 0 • 2017-01-09 12 34 0.90 0.1 0.1 0 • 2017-01-29 12 567 0.82 0.8 0.1 1 • 2017-01-02 144 5 0.19 0.9 0.9 0 • 2017-02-20 144 34 0.40 0.3 0.8 0 • 2017-02-03 1550 567 0.99 0.9 0.8 1
  • 31. 31 Machine Learning at Zillow Home Valuation • Zestimate • Zestimate Forecast • Zillow Home Value Index • Rent Zestimate • Zillow Rent Index • Pricing Tool • Best Time to List B2B • Ad Campaigns • Agent segmentation • Search Engine Marketing (SEM) Computer Vision • Videos • Photos User Profiles • Persona Predictions • Journey location prediction • Lender Recommendations Recommendations • Home recommendation • Similar homes • New regions to explore • Explain recommendations
  • 32. 32 Tools • Spark (Scala and Python) • R • Python (numpy, scipy, sklearn, pandas) • Random forest • Linear, logistic, quantile regressions. • Deep neural nets. • Matrix Factorization • Etc. • AWS
  • 33. 33 Zillow Core Values • Own it. • Turn on the Lights. • ZG is a Team Sport. • Move Fast. Think Big. • Winning is Fun. • Act With Integrity
  • 34. 3434 We’re hiring! • Data Scientist, Computer Vision and Deep learning • Software Engineer, Machine Learning • Data Scientist, Machine Learning • Internship opportunities across Analytics - Glassdoor reviews: Top 10 in Seattle Business Magazine 100 Best Companies (#3) - Glassdoor’s Employees’ Choice Best Places to Work; Glassdoor’s Best Benefits and Perks; www.zillow.com/jobs www.zillow.com/data-science

Editor's Notes

  1. Roadmap for today: Overview of company, data, and culture Introduce the Data Science and Engineering team and the problems we try to solve Leave time at the end for general Q&A
  2. Zillow was founded ten years ago with a simple but incredibly ambitious mission: To build the world’s largest, most trusted and most vibrant home-related marketplace. What this means is that we’re a company which creates a marketplace, and a marketplace has consumers and practitioners., We’re not a brokerage, not an agent, not an MLS; We are creating a marketplace – a place where consumers and producers congregate to conduct commerce with one another.
  3. For buyers: - We help buyers understand the state of the marketplace, what can they afford provide them information about each and every listing recommend homes for them, and alert them when a new relevant listing came to market Help them to price a listing. Help them to chose an agent based on rating and number of sales. For sellers: Help them to price their home. See how many people view it online. Connect them to an agent to help them sell, or let them sell by themselves.
  4. For agents, lenders: - provide a way to connect with new clients, and to demonstrate their success.
  5. A few years ago Zillow went into rentals and today it’s the leading site in this category in the US.
  6. Here on the bottom right we can see where agents have an opportunity to connect with buyers.
  7. Ten years ago, we were just Zillow, but our brand portfolio grew over time and reflects our mission. Each brand is striving to empower the consumers through transparency. Zillow, Trulia and Hotpads focuses on homes and rentals nation wide. StreetEasy and Naked Apartments focus on NYC. Business brands: Mortgage quotes/rates (Mortech), transaction platform (dotloop)
  8. Huge user base. 30MM rental shopper per month. First in real estate class - double from our largest competitor (Realtor.com ) 78% Market share of all mobile exclusive visitors to real estate category. In July - Half a billion homes were viewed on Zillow Mobile (270/second) (?????) Mortgages – 35 million requests in last year
  9. Steven
  10. There are 21 people in the picture. We are actually 48 people now, and have 12 open positions. Our mission: We attack Zillow’s DS challenges. Today I’ll talk about the
  11. Start with demo Zestimate is what made Zillow so famous. It started on day 1, and it what differentiates us from our competitors. <go over list> Zillow Home Value index is a economic index derived from the Zestimate. Today it is used by large financial institution, organization and municipalities to understand the real estate market and help decision making. This means that Zestimate is not only helping individuals to value homes, it also help decision makers to understand the housing market.
  12. This is a supervised learning problem. Each home in our dataset, has a set of features associated with it and its sale price. Our goal is to predict the sale price using the features.
  13. David
  14. Netflix page is very personalized and tailored to the user interests. Each row gives a different way to organize movies. The first and created by the same model, which gets a collection of movies with a single attributes and rank them according to the user viewing habits. The second row is from a completely different model the rank similarity between movies. All these rows are ordered by a third model. - We would like to simplify the home buying experience and make it as easy a choosing a movie on Netflix.
  15. Each type of recommendation answers different needs. Email – We would like to send users alerts when their dream home comes on the market, or show them homes that they might wouldn’t consider. The challenge is how not to spam. When viewing a home, showing other similar homes that the user might like. When ranking search results, we need to chose the most relevant homes to go to the top of the list.
  16. In recommendation what we usually have is a set of user-item pairs and a corresponding label. The idea is that if we can predict whether a user would like a listing we could make good recommendations. This seems is a supervised learning problem. In real life it’s much more complicated. - How do we know if a user like an item? Most users don’t explicitly tells us. For example, most users don’t rate movies and like videos on youtube. Even when user tells us, it does not necessary means what we want it to mean. For instance, a user might not like a listing, but it was very relevant for him because at this stage she’s just exploring the market and she would like to understand what she can afford. So listings for homes we will never buy help us understand our options. The challenge with recommendation is that we never solve for the problem that we would like to solve. We only solve for a surrogate problem. So part of our work is to find the best surrogate problem to solve.
  17. We have a very large catalog. No of users is on the same order as the number of Items. No popular items. Block diagonal matrix
  18. To complicate things, we have features associated with the listings. And we have user activity. How can we translate that to features that are predictive of the outcome.
  19. Shown mission/brands/data – how do we get there Zillow culture - people Share people you like David – ZG is a team sport, turn on the lights (anonymous questions, wikis, open discussion) Steven –Winning Is Fun – competition, Move Fast Think Big (hackweeks)