SlideShare a Scribd company logo
11
DATA SCIENCE AT ZILLOW
Imri Sofer, Senior Data Scientist
2
Zillow Group’s mission
TO BUILD THE WORLD’S LARGEST, MOST TRUSTED AND
VIBRANT HOME-RELATED MARKETPLACE
3
Zillow’s marketplace
PROPERTY
MANAGERS
& LANDLORDS
BUYERS &
SELLERS
RENTERS
HOMEOWNERS
REAL ESTATE AGENTS
MORTGAGE PROVIDERS
4
For buyers
5
For buyers
6
For renters
7
Zillow Group two step business model:
1. Make amazing products to
attract users
2. Professionals pay to show case
themselves.
8
For buyers
9
The largest portfolio of real estate brands
CONSUMER BRANDS
BUSINESS BRANDS
10
Zillow Group’s audience continues to grow
MONTHLY UNIQUE USERS
Quarterly average (Millions)
0
20
40
60
80
100
120
140
160
180
Seasonal peak of
171M
Unique visitors in May 2016
1111
Why is data science
important to Zillow?
Because Zillow is data
12
Zillow is data
- Our product is driven by data
- The largest most comprehensive housing data (Breadth and depth).
- Over 65 million have been updated by users.
- Our product generates data
- 2MM Reviews of agents.
- More than 300,000 lender reviews.
- 1TB of user activity every day.
- Data is our product
- Users come to Zillow because they trust our housing data.
- Users want to find a trusted agent, and lender that provide great rates and
services.
- We provide data for free for academic/institutional researchers.
- Zillow.com/data – free consumer data (Zillow home value index is available at
a monthly frequency for the nation through states, to neighborhoods.)
13
Data Science and Engineering at Zillow
Clam Bake Beach Day, Aug 2016, at Golden Gardens Park in Seattle, WA
14
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
B2B
• Ad Campaigns
• Agent segmentation
• Search Engine Marketing (SEM)
Computer Vision
• Videos
• Photos
User Profiles
• Persona Predictions
• Journey location prediction
• Lender Recommendations
Recommendations
• Home recommendation
• Similar homes
• New regions to explore
• Explain recommendations
15
Machine Learning at Zillow
• Example page
Home Valuation
• Zestimate
• Zestimate Forecast
• Rent Zestimate
• Pricing Tool
• Best Time to List
• Zillow Home Value Index
• Zillow Rent Index
example page
16
Zestimate
Goals:
• High Accuracy
• Low Bias
• Independent
• Stable over time.
• Robust to outliers.
• High coverage (Over 100
million homes currently)
• Able to respond to user fact
changes
17
Challenges with the Zestimate
• Some listings are missing features: How do we deal with missing data?
• Some listings have corrupted features (e.g. 28 bathrooms): How do we
identify those?
• Some sale prices do not reflect the value of the home(e.g. a parent
sales to his child): how do we deal with outliers?
• Feature engineering: How can we translate previous sales to
meaningful features?
• How do we identify the places where the model needs to be improve?
18
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
Computer Vision
• Videos
• Photos
19
Computer Vision at Zillow
• Images and videos play a big role in helping people buy/rent
homes
• Recent deep-learning advancements for CV
20
Let Zillow See
• As of now, our Zestimates are mainly based on
location and size of the properties and they do not
consider the quality.
• Tax assessment might carry house quality
information up to some extent but that’s not
enough.
• For example, an interior upgrade would not change the
tax assessment in most cases if not all
21
• We train a deep convolutional neural network (CNN) to estimate
quality.
Deep Convolutional Neural
Network
Zestimate
22
Image quality scores (prediction)
[0-3] [3-7] [7-10]
23
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
Computer Vision
• Videos
• Photos
Recommendations
• Home recommendation
• Similar homes
• New regions to explore
• Explain recommendations
24
Recommending movies
25
Home Recommendations
• Our goal is to show users the homes that are relevant to them.
Email
When viewing a home
Ranking search results
26
Email Recommendation
• Goal: Take past user activity and generate relevant recommendations
for new and existing listings.
• Challenges:
• How do we transform user activity into a vector of features?
• What do we want to optimize for? Clicks? Dwell time? Saves?
• What should we do when users don’t have a browsing history (cold start)?
• How can we scale the model to rank 2.5MM homes for 50M buyers? Most
recommendation algorithms are not built for this problem (Netlifx has 5000
movies in its catalog)
27
• user_id listing_id like
• 12 5 1
• 12 34 0
• 12 567 1
• 144 5 0
• 144 34 0
• 1550 567 1
28
Traditional User-Item matrix
Users
Traditional
Items
29
Zillow’s User-Item matrix
Users
Zillow
Items
30
How can we generate meaningful features?
• Date user_id listing_id f1 f2 ... f50 like
• 2017-01-02 12 5 0.89 0.3 0.6 0
• 2017-01-09 12 34 0.90 0.1 0.1 0
• 2017-01-29 12 567 0.82 0.8 0.1 1
• 2017-01-02 144 5 0.19 0.9 0.9 0
• 2017-02-20 144 34 0.40 0.3 0.8 0
• 2017-02-03 1550 567 0.99 0.9 0.8 1
31
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
B2B
• Ad Campaigns
• Agent segmentation
• Search Engine Marketing (SEM)
Computer Vision
• Videos
• Photos
User Profiles
• Persona Predictions
• Journey location prediction
• Lender Recommendations
Recommendations
• Home recommendation
• Similar homes
• New regions to explore
• Explain recommendations
32
Tools
• Spark (Scala and Python)
• R
• Python (numpy, scipy, sklearn, pandas)
• Random forest
• Linear, logistic, quantile regressions.
• Deep neural nets.
• Matrix Factorization
• Etc.
• AWS
33
Zillow Core Values
• Own it.
• Turn on the Lights.
• ZG is a Team Sport.
• Move Fast. Think Big.
• Winning is Fun.
• Act With Integrity
3434
We’re hiring!
• Data Scientist, Computer Vision and Deep learning
• Software Engineer, Machine Learning
• Data Scientist, Machine Learning
• Internship opportunities across Analytics
- Glassdoor reviews: Top 10 in Seattle Business Magazine
100 Best Companies (#3)
- Glassdoor’s Employees’ Choice Best Places to Work;
Glassdoor’s Best Benefits and Perks;
www.zillow.com/jobs
www.zillow.com/data-science

More Related Content

What's hot

Linio IR Deck - May 2014
Linio IR Deck - May 2014Linio IR Deck - May 2014
Linio IR Deck - May 2014
SYGroup
 
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackGraph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
Chris Bizer
 
How to Import JSON Using Cypher and APOC
How to Import JSON Using Cypher and APOCHow to Import JSON Using Cypher and APOC
How to Import JSON Using Cypher and APOC
Neo4j
 
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkUber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache Flink
Wenrui Meng
 
Snapchat Advertising Sales Deck
Snapchat Advertising Sales DeckSnapchat Advertising Sales Deck
Snapchat Advertising Sales Deck
Ryan Gum
 
Telus - Network as a service
Telus - Network as a serviceTelus - Network as a service
Telus - Network as a service
Gavin M Amos.
 
Network Effects
Network EffectsNetwork Effects
Network Effects
a16z
 
Care Is The New Marketing
Care Is The New MarketingCare Is The New Marketing
Care Is The New Marketing
Sprinklr
 
Metaverse - The Future of Marketing and Web 3.0.pdf
Metaverse - The Future of Marketing and Web 3.0.pdfMetaverse - The Future of Marketing and Web 3.0.pdf
Metaverse - The Future of Marketing and Web 3.0.pdf
thetechnologynews
 
Condi Deck
Condi DeckCondi Deck
Condi Deck
Austin Allison
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
Aish Fenton
 
Brochure.pdf
Brochure.pdfBrochure.pdf
Brochure.pdf
AvijitPandit3
 
Lead Nurturing with Storytelling
Lead Nurturing with StorytellingLead Nurturing with Storytelling
Lead Nurturing with Storytelling
Josh Hill
 
Apache kudu
Apache kuduApache kudu
Apache kudu
Asim Jalis
 
WebHopers Company Profile | Best Digital Marketing & Web Development Company
WebHopers Company Profile | Best Digital Marketing & Web Development CompanyWebHopers Company Profile | Best Digital Marketing & Web Development Company
WebHopers Company Profile | Best Digital Marketing & Web Development Company
Mohit WebHopers
 
3 Do Case Analysis
3 Do Case Analysis3 Do Case Analysis
3 Do Case Analysis
smehro
 
Zestimate Lambda Architecture
Zestimate Lambda ArchitectureZestimate Lambda Architecture
Zestimate Lambda Architecture
Steven Hoelscher
 
How to Pitch Your Startup to Investors
How to Pitch Your Startup to InvestorsHow to Pitch Your Startup to Investors
How to Pitch Your Startup to Investors
Jeremey Donovan
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
Primal Pappachan
 
Netflix: Digital Marketing Evaluation of the Over-the-top Media-Service Provider
Netflix: Digital Marketing Evaluation of the Over-the-top Media-Service ProviderNetflix: Digital Marketing Evaluation of the Over-the-top Media-Service Provider
Netflix: Digital Marketing Evaluation of the Over-the-top Media-Service Provider
SagarChaujar
 

What's hot (20)

Linio IR Deck - May 2014
Linio IR Deck - May 2014Linio IR Deck - May 2014
Linio IR Deck - May 2014
 
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackGraph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
 
How to Import JSON Using Cypher and APOC
How to Import JSON Using Cypher and APOCHow to Import JSON Using Cypher and APOC
How to Import JSON Using Cypher and APOC
 
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkUber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache Flink
 
Snapchat Advertising Sales Deck
Snapchat Advertising Sales DeckSnapchat Advertising Sales Deck
Snapchat Advertising Sales Deck
 
Telus - Network as a service
Telus - Network as a serviceTelus - Network as a service
Telus - Network as a service
 
Network Effects
Network EffectsNetwork Effects
Network Effects
 
Care Is The New Marketing
Care Is The New MarketingCare Is The New Marketing
Care Is The New Marketing
 
Metaverse - The Future of Marketing and Web 3.0.pdf
Metaverse - The Future of Marketing and Web 3.0.pdfMetaverse - The Future of Marketing and Web 3.0.pdf
Metaverse - The Future of Marketing and Web 3.0.pdf
 
Condi Deck
Condi DeckCondi Deck
Condi Deck
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
 
Brochure.pdf
Brochure.pdfBrochure.pdf
Brochure.pdf
 
Lead Nurturing with Storytelling
Lead Nurturing with StorytellingLead Nurturing with Storytelling
Lead Nurturing with Storytelling
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 
WebHopers Company Profile | Best Digital Marketing & Web Development Company
WebHopers Company Profile | Best Digital Marketing & Web Development CompanyWebHopers Company Profile | Best Digital Marketing & Web Development Company
WebHopers Company Profile | Best Digital Marketing & Web Development Company
 
3 Do Case Analysis
3 Do Case Analysis3 Do Case Analysis
3 Do Case Analysis
 
Zestimate Lambda Architecture
Zestimate Lambda ArchitectureZestimate Lambda Architecture
Zestimate Lambda Architecture
 
How to Pitch Your Startup to Investors
How to Pitch Your Startup to InvestorsHow to Pitch Your Startup to Investors
How to Pitch Your Startup to Investors
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
 
Netflix: Digital Marketing Evaluation of the Over-the-top Media-Service Provider
Netflix: Digital Marketing Evaluation of the Over-the-top Media-Service ProviderNetflix: Digital Marketing Evaluation of the Over-the-top Media-Service Provider
Netflix: Digital Marketing Evaluation of the Over-the-top Media-Service Provider
 

Similar to Overview of Data Science at Zillow

Neighborhood Match Pitch
Neighborhood Match PitchNeighborhood Match Pitch
Neighborhood Match Pitch
Kostub Deshmukh
 
QH_SalesPitch (2).pdf
QH_SalesPitch (2).pdfQH_SalesPitch (2).pdf
QH_SalesPitch (2).pdf
Prashant Thakur
 
Roommatefax Inc. Pitch Deck
Roommatefax Inc. Pitch DeckRoommatefax Inc. Pitch Deck
Roommatefax Inc. Pitch Deck
Steve Wolf
 
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012
Zane Salim
 
Usalytics.pitch.v3.1
Usalytics.pitch.v3.1Usalytics.pitch.v3.1
Usalytics.pitch.v3.1
Alexey Vorobiev
 
Listing Presentation St. Charles IL(1)
Listing Presentation St. Charles IL(1)Listing Presentation St. Charles IL(1)
Listing Presentation St. Charles IL(1)
Leslie Ebersole
 
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion RevenueZillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
Optimizely
 
Peppy walls business model
Peppy walls   business modelPeppy walls   business model
Peppy walls business model
pavithran ayyala
 
Internet Marketing, EO Accelerator Presentation
Internet Marketing, EO Accelerator PresentationInternet Marketing, EO Accelerator Presentation
Internet Marketing, EO Accelerator Presentation
Top Draw Inc.
 
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMS
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMSGAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMS
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMS
GameZBoost
 
Today's Renter
Today's RenterToday's Renter
Today's Renter
On-Site
 
2016 Technology in the Vacation Rental Industry
2016 Technology in the Vacation Rental Industry2016 Technology in the Vacation Rental Industry
2016 Technology in the Vacation Rental Industry
Amy Hinote
 
6 Top Real Estate Managed Analytics Service Providers.pptx
6 Top Real Estate Managed Analytics Service Providers.pptx6 Top Real Estate Managed Analytics Service Providers.pptx
6 Top Real Estate Managed Analytics Service Providers.pptx
Kavika Roy
 
Reocon social media power of lead generation_1-29-2012
Reocon social media power of lead generation_1-29-2012Reocon social media power of lead generation_1-29-2012
Reocon social media power of lead generation_1-29-2012
Ken Blevins
 
TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech World
VoltDB
 
Tipping Point for CRE Tech - Brandon Weber, VTS
Tipping Point for CRE Tech - Brandon Weber, VTSTipping Point for CRE Tech - Brandon Weber, VTS
Tipping Point for CRE Tech - Brandon Weber, VTS
Ryan Slack
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
BigML, Inc
 
Atlas Arkansas What Would Google Do?
Atlas Arkansas What Would Google Do?Atlas Arkansas What Would Google Do?
Atlas Arkansas What Would Google Do?
Atlas Integrated
 
ZingClick- Innovating solutions
ZingClick- Innovating solutionsZingClick- Innovating solutions
ZingClick- Innovating solutions
Zing Click
 
Bali imedia april 2013 draft 2
Bali imedia april 2013 draft 2Bali imedia april 2013 draft 2
Bali imedia april 2013 draft 2
Kanika
 

Similar to Overview of Data Science at Zillow (20)

Neighborhood Match Pitch
Neighborhood Match PitchNeighborhood Match Pitch
Neighborhood Match Pitch
 
QH_SalesPitch (2).pdf
QH_SalesPitch (2).pdfQH_SalesPitch (2).pdf
QH_SalesPitch (2).pdf
 
Roommatefax Inc. Pitch Deck
Roommatefax Inc. Pitch DeckRoommatefax Inc. Pitch Deck
Roommatefax Inc. Pitch Deck
 
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012
Neighborhood Match - Top 10 Team - Stanford Venture Lab 2012
 
Usalytics.pitch.v3.1
Usalytics.pitch.v3.1Usalytics.pitch.v3.1
Usalytics.pitch.v3.1
 
Listing Presentation St. Charles IL(1)
Listing Presentation St. Charles IL(1)Listing Presentation St. Charles IL(1)
Listing Presentation St. Charles IL(1)
 
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion RevenueZillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
 
Peppy walls business model
Peppy walls   business modelPeppy walls   business model
Peppy walls business model
 
Internet Marketing, EO Accelerator Presentation
Internet Marketing, EO Accelerator PresentationInternet Marketing, EO Accelerator Presentation
Internet Marketing, EO Accelerator Presentation
 
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMS
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMSGAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMS
GAMING INDUSTRY ENABLER. WE PROVIDE WHITE LABEL & CUSTOM GAMES PLATFORMS
 
Today's Renter
Today's RenterToday's Renter
Today's Renter
 
2016 Technology in the Vacation Rental Industry
2016 Technology in the Vacation Rental Industry2016 Technology in the Vacation Rental Industry
2016 Technology in the Vacation Rental Industry
 
6 Top Real Estate Managed Analytics Service Providers.pptx
6 Top Real Estate Managed Analytics Service Providers.pptx6 Top Real Estate Managed Analytics Service Providers.pptx
6 Top Real Estate Managed Analytics Service Providers.pptx
 
Reocon social media power of lead generation_1-29-2012
Reocon social media power of lead generation_1-29-2012Reocon social media power of lead generation_1-29-2012
Reocon social media power of lead generation_1-29-2012
 
TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech World
 
Tipping Point for CRE Tech - Brandon Weber, VTS
Tipping Point for CRE Tech - Brandon Weber, VTSTipping Point for CRE Tech - Brandon Weber, VTS
Tipping Point for CRE Tech - Brandon Weber, VTS
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
 
Atlas Arkansas What Would Google Do?
Atlas Arkansas What Would Google Do?Atlas Arkansas What Would Google Do?
Atlas Arkansas What Would Google Do?
 
ZingClick- Innovating solutions
ZingClick- Innovating solutionsZingClick- Innovating solutions
ZingClick- Innovating solutions
 
Bali imedia april 2013 draft 2
Bali imedia april 2013 draft 2Bali imedia april 2013 draft 2
Bali imedia april 2013 draft 2
 

Recently uploaded

Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 

Recently uploaded (20)

Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 

Overview of Data Science at Zillow

  • 1. 11 DATA SCIENCE AT ZILLOW Imri Sofer, Senior Data Scientist
  • 2. 2 Zillow Group’s mission TO BUILD THE WORLD’S LARGEST, MOST TRUSTED AND VIBRANT HOME-RELATED MARKETPLACE
  • 3. 3 Zillow’s marketplace PROPERTY MANAGERS & LANDLORDS BUYERS & SELLERS RENTERS HOMEOWNERS REAL ESTATE AGENTS MORTGAGE PROVIDERS
  • 7. 7 Zillow Group two step business model: 1. Make amazing products to attract users 2. Professionals pay to show case themselves.
  • 9. 9 The largest portfolio of real estate brands CONSUMER BRANDS BUSINESS BRANDS
  • 10. 10 Zillow Group’s audience continues to grow MONTHLY UNIQUE USERS Quarterly average (Millions) 0 20 40 60 80 100 120 140 160 180 Seasonal peak of 171M Unique visitors in May 2016
  • 11. 1111 Why is data science important to Zillow? Because Zillow is data
  • 12. 12 Zillow is data - Our product is driven by data - The largest most comprehensive housing data (Breadth and depth). - Over 65 million have been updated by users. - Our product generates data - 2MM Reviews of agents. - More than 300,000 lender reviews. - 1TB of user activity every day. - Data is our product - Users come to Zillow because they trust our housing data. - Users want to find a trusted agent, and lender that provide great rates and services. - We provide data for free for academic/institutional researchers. - Zillow.com/data – free consumer data (Zillow home value index is available at a monthly frequency for the nation through states, to neighborhoods.)
  • 13. 13 Data Science and Engineering at Zillow Clam Bake Beach Day, Aug 2016, at Golden Gardens Park in Seattle, WA
  • 14. 14 Machine Learning at Zillow Home Valuation • Zestimate • Zestimate Forecast • Zillow Home Value Index • Rent Zestimate • Zillow Rent Index • Pricing Tool • Best Time to List B2B • Ad Campaigns • Agent segmentation • Search Engine Marketing (SEM) Computer Vision • Videos • Photos User Profiles • Persona Predictions • Journey location prediction • Lender Recommendations Recommendations • Home recommendation • Similar homes • New regions to explore • Explain recommendations
  • 15. 15 Machine Learning at Zillow • Example page Home Valuation • Zestimate • Zestimate Forecast • Rent Zestimate • Pricing Tool • Best Time to List • Zillow Home Value Index • Zillow Rent Index example page
  • 16. 16 Zestimate Goals: • High Accuracy • Low Bias • Independent • Stable over time. • Robust to outliers. • High coverage (Over 100 million homes currently) • Able to respond to user fact changes
  • 17. 17 Challenges with the Zestimate • Some listings are missing features: How do we deal with missing data? • Some listings have corrupted features (e.g. 28 bathrooms): How do we identify those? • Some sale prices do not reflect the value of the home(e.g. a parent sales to his child): how do we deal with outliers? • Feature engineering: How can we translate previous sales to meaningful features? • How do we identify the places where the model needs to be improve?
  • 18. 18 Machine Learning at Zillow Home Valuation • Zestimate • Zestimate Forecast • Zillow Home Value Index • Rent Zestimate • Zillow Rent Index • Pricing Tool • Best Time to List Computer Vision • Videos • Photos
  • 19. 19 Computer Vision at Zillow • Images and videos play a big role in helping people buy/rent homes • Recent deep-learning advancements for CV
  • 20. 20 Let Zillow See • As of now, our Zestimates are mainly based on location and size of the properties and they do not consider the quality. • Tax assessment might carry house quality information up to some extent but that’s not enough. • For example, an interior upgrade would not change the tax assessment in most cases if not all
  • 21. 21 • We train a deep convolutional neural network (CNN) to estimate quality. Deep Convolutional Neural Network Zestimate
  • 22. 22 Image quality scores (prediction) [0-3] [3-7] [7-10]
  • 23. 23 Machine Learning at Zillow Home Valuation • Zestimate • Zestimate Forecast • Zillow Home Value Index • Rent Zestimate • Zillow Rent Index • Pricing Tool • Best Time to List Computer Vision • Videos • Photos Recommendations • Home recommendation • Similar homes • New regions to explore • Explain recommendations
  • 25. 25 Home Recommendations • Our goal is to show users the homes that are relevant to them. Email When viewing a home Ranking search results
  • 26. 26 Email Recommendation • Goal: Take past user activity and generate relevant recommendations for new and existing listings. • Challenges: • How do we transform user activity into a vector of features? • What do we want to optimize for? Clicks? Dwell time? Saves? • What should we do when users don’t have a browsing history (cold start)? • How can we scale the model to rank 2.5MM homes for 50M buyers? Most recommendation algorithms are not built for this problem (Netlifx has 5000 movies in its catalog)
  • 27. 27 • user_id listing_id like • 12 5 1 • 12 34 0 • 12 567 1 • 144 5 0 • 144 34 0 • 1550 567 1
  • 30. 30 How can we generate meaningful features? • Date user_id listing_id f1 f2 ... f50 like • 2017-01-02 12 5 0.89 0.3 0.6 0 • 2017-01-09 12 34 0.90 0.1 0.1 0 • 2017-01-29 12 567 0.82 0.8 0.1 1 • 2017-01-02 144 5 0.19 0.9 0.9 0 • 2017-02-20 144 34 0.40 0.3 0.8 0 • 2017-02-03 1550 567 0.99 0.9 0.8 1
  • 31. 31 Machine Learning at Zillow Home Valuation • Zestimate • Zestimate Forecast • Zillow Home Value Index • Rent Zestimate • Zillow Rent Index • Pricing Tool • Best Time to List B2B • Ad Campaigns • Agent segmentation • Search Engine Marketing (SEM) Computer Vision • Videos • Photos User Profiles • Persona Predictions • Journey location prediction • Lender Recommendations Recommendations • Home recommendation • Similar homes • New regions to explore • Explain recommendations
  • 32. 32 Tools • Spark (Scala and Python) • R • Python (numpy, scipy, sklearn, pandas) • Random forest • Linear, logistic, quantile regressions. • Deep neural nets. • Matrix Factorization • Etc. • AWS
  • 33. 33 Zillow Core Values • Own it. • Turn on the Lights. • ZG is a Team Sport. • Move Fast. Think Big. • Winning is Fun. • Act With Integrity
  • 34. 3434 We’re hiring! • Data Scientist, Computer Vision and Deep learning • Software Engineer, Machine Learning • Data Scientist, Machine Learning • Internship opportunities across Analytics - Glassdoor reviews: Top 10 in Seattle Business Magazine 100 Best Companies (#3) - Glassdoor’s Employees’ Choice Best Places to Work; Glassdoor’s Best Benefits and Perks; www.zillow.com/jobs www.zillow.com/data-science

Editor's Notes

  1. Roadmap for today: Overview of company, data, and culture Introduce the Data Science and Engineering team and the problems we try to solve Leave time at the end for general Q&A
  2. Zillow was founded ten years ago with a simple but incredibly ambitious mission: To build the world’s largest, most trusted and most vibrant home-related marketplace. What this means is that we’re a company which creates a marketplace, and a marketplace has consumers and practitioners., We’re not a brokerage, not an agent, not an MLS; We are creating a marketplace – a place where consumers and producers congregate to conduct commerce with one another.
  3. For buyers: - We help buyers understand the state of the marketplace, what can they afford provide them information about each and every listing recommend homes for them, and alert them when a new relevant listing came to market Help them to price a listing. Help them to chose an agent based on rating and number of sales. For sellers: Help them to price their home. See how many people view it online. Connect them to an agent to help them sell, or let them sell by themselves.
  4. For agents, lenders: - provide a way to connect with new clients, and to demonstrate their success.
  5. A few years ago Zillow went into rentals and today it’s the leading site in this category in the US.
  6. Here on the bottom right we can see where agents have an opportunity to connect with buyers.
  7. Ten years ago, we were just Zillow, but our brand portfolio grew over time and reflects our mission. Each brand is striving to empower the consumers through transparency. Zillow, Trulia and Hotpads focuses on homes and rentals nation wide. StreetEasy and Naked Apartments focus on NYC. Business brands: Mortgage quotes/rates (Mortech), transaction platform (dotloop)
  8. Huge user base. 30MM rental shopper per month. First in real estate class - double from our largest competitor (Realtor.com ) 78% Market share of all mobile exclusive visitors to real estate category. In July - Half a billion homes were viewed on Zillow Mobile (270/second) (?????) Mortgages – 35 million requests in last year
  9. Steven
  10. There are 21 people in the picture. We are actually 48 people now, and have 12 open positions. Our mission: We attack Zillow’s DS challenges. Today I’ll talk about the
  11. Start with demo Zestimate is what made Zillow so famous. It started on day 1, and it what differentiates us from our competitors. <go over list> Zillow Home Value index is a economic index derived from the Zestimate. Today it is used by large financial institution, organization and municipalities to understand the real estate market and help decision making. This means that Zestimate is not only helping individuals to value homes, it also help decision makers to understand the housing market.
  12. This is a supervised learning problem. Each home in our dataset, has a set of features associated with it and its sale price. Our goal is to predict the sale price using the features.
  13. David
  14. Netflix page is very personalized and tailored to the user interests. Each row gives a different way to organize movies. The first and created by the same model, which gets a collection of movies with a single attributes and rank them according to the user viewing habits. The second row is from a completely different model the rank similarity between movies. All these rows are ordered by a third model. - We would like to simplify the home buying experience and make it as easy a choosing a movie on Netflix.
  15. Each type of recommendation answers different needs. Email – We would like to send users alerts when their dream home comes on the market, or show them homes that they might wouldn’t consider. The challenge is how not to spam. When viewing a home, showing other similar homes that the user might like. When ranking search results, we need to chose the most relevant homes to go to the top of the list.
  16. In recommendation what we usually have is a set of user-item pairs and a corresponding label. The idea is that if we can predict whether a user would like a listing we could make good recommendations. This seems is a supervised learning problem. In real life it’s much more complicated. - How do we know if a user like an item? Most users don’t explicitly tells us. For example, most users don’t rate movies and like videos on youtube. Even when user tells us, it does not necessary means what we want it to mean. For instance, a user might not like a listing, but it was very relevant for him because at this stage she’s just exploring the market and she would like to understand what she can afford. So listings for homes we will never buy help us understand our options. The challenge with recommendation is that we never solve for the problem that we would like to solve. We only solve for a surrogate problem. So part of our work is to find the best surrogate problem to solve.
  17. We have a very large catalog. No of users is on the same order as the number of Items. No popular items. Block diagonal matrix
  18. To complicate things, we have features associated with the listings. And we have user activity. How can we translate that to features that are predictive of the outcome.
  19. Shown mission/brands/data – how do we get there Zillow culture - people Share people you like David – ZG is a team sport, turn on the lights (anonymous questions, wikis, open discussion) Steven –Winning Is Fun – competition, Move Fast Think Big (hackweeks)