10. 10
Zillow Group’s audience continues to grow
MONTHLY UNIQUE USERS
Quarterly average (Millions)
0
20
40
60
80
100
120
140
160
180
Seasonal peak of
171M
Unique visitors in May 2016
11. 1111
Why is data science
important to Zillow?
Because Zillow is data
12. 12
Zillow is data
- Our product is driven by data
- The largest most comprehensive housing data (Breadth and depth).
- Over 65 million have been updated by users.
- Our product generates data
- 2MM Reviews of agents.
- More than 300,000 lender reviews.
- 1TB of user activity every day.
- Data is our product
- Users come to Zillow because they trust our housing data.
- Users want to find a trusted agent, and lender that provide great rates and
services.
- We provide data for free for academic/institutional researchers.
- Zillow.com/data – free consumer data (Zillow home value index is available at
a monthly frequency for the nation through states, to neighborhoods.)
13. 13
Data Science and Engineering at Zillow
Clam Bake Beach Day, Aug 2016, at Golden Gardens Park in Seattle, WA
14. 14
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
B2B
• Ad Campaigns
• Agent segmentation
• Search Engine Marketing (SEM)
Computer Vision
• Videos
• Photos
User Profiles
• Persona Predictions
• Journey location prediction
• Lender Recommendations
Recommendations
• Home recommendation
• Similar homes
• New regions to explore
• Explain recommendations
15. 15
Machine Learning at Zillow
• Example page
Home Valuation
• Zestimate
• Zestimate Forecast
• Rent Zestimate
• Pricing Tool
• Best Time to List
• Zillow Home Value Index
• Zillow Rent Index
example page
16. 16
Zestimate
Goals:
• High Accuracy
• Low Bias
• Independent
• Stable over time.
• Robust to outliers.
• High coverage (Over 100
million homes currently)
• Able to respond to user fact
changes
17. 17
Challenges with the Zestimate
• Some listings are missing features: How do we deal with missing data?
• Some listings have corrupted features (e.g. 28 bathrooms): How do we
identify those?
• Some sale prices do not reflect the value of the home(e.g. a parent
sales to his child): how do we deal with outliers?
• Feature engineering: How can we translate previous sales to
meaningful features?
• How do we identify the places where the model needs to be improve?
18. 18
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
Computer Vision
• Videos
• Photos
19. 19
Computer Vision at Zillow
• Images and videos play a big role in helping people buy/rent
homes
• Recent deep-learning advancements for CV
20. 20
Let Zillow See
• As of now, our Zestimates are mainly based on
location and size of the properties and they do not
consider the quality.
• Tax assessment might carry house quality
information up to some extent but that’s not
enough.
• For example, an interior upgrade would not change the
tax assessment in most cases if not all
21. 21
• We train a deep convolutional neural network (CNN) to estimate
quality.
Deep Convolutional Neural
Network
Zestimate
23. 23
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
Computer Vision
• Videos
• Photos
Recommendations
• Home recommendation
• Similar homes
• New regions to explore
• Explain recommendations
25. 25
Home Recommendations
• Our goal is to show users the homes that are relevant to them.
Email
When viewing a home
Ranking search results
26. 26
Email Recommendation
• Goal: Take past user activity and generate relevant recommendations
for new and existing listings.
• Challenges:
• How do we transform user activity into a vector of features?
• What do we want to optimize for? Clicks? Dwell time? Saves?
• What should we do when users don’t have a browsing history (cold start)?
• How can we scale the model to rank 2.5MM homes for 50M buyers? Most
recommendation algorithms are not built for this problem (Netlifx has 5000
movies in its catalog)
31. 31
Machine Learning at Zillow
Home Valuation
• Zestimate
• Zestimate Forecast
• Zillow Home Value Index
• Rent Zestimate
• Zillow Rent Index
• Pricing Tool
• Best Time to List
B2B
• Ad Campaigns
• Agent segmentation
• Search Engine Marketing (SEM)
Computer Vision
• Videos
• Photos
User Profiles
• Persona Predictions
• Journey location prediction
• Lender Recommendations
Recommendations
• Home recommendation
• Similar homes
• New regions to explore
• Explain recommendations
32. 32
Tools
• Spark (Scala and Python)
• R
• Python (numpy, scipy, sklearn, pandas)
• Random forest
• Linear, logistic, quantile regressions.
• Deep neural nets.
• Matrix Factorization
• Etc.
• AWS
33. 33
Zillow Core Values
• Own it.
• Turn on the Lights.
• ZG is a Team Sport.
• Move Fast. Think Big.
• Winning is Fun.
• Act With Integrity
34. 3434
We’re hiring!
• Data Scientist, Computer Vision and Deep learning
• Software Engineer, Machine Learning
• Data Scientist, Machine Learning
• Internship opportunities across Analytics
- Glassdoor reviews: Top 10 in Seattle Business Magazine
100 Best Companies (#3)
- Glassdoor’s Employees’ Choice Best Places to Work;
Glassdoor’s Best Benefits and Perks;
www.zillow.com/jobs
www.zillow.com/data-science
Editor's Notes
Roadmap for today:
Overview of company, data, and culture
Introduce the Data Science and Engineering team and the problems we try to solve
Leave time at the end for general Q&A
Zillow was founded ten years ago with a simple but incredibly ambitious mission: To build the world’s largest, most trusted and most vibrant home-related marketplace.
What this means is that we’re a company which creates a marketplace, and a marketplace has consumers and practitioners., We’re not a brokerage, not an agent, not an MLS; We are creating a marketplace – a place where consumers and producers congregate to conduct commerce with one another.
For buyers:
- We help buyers understand the state of the marketplace, what can they afford
provide them information about each and every listing
recommend homes for them, and alert them when a new relevant listing came to market
Help them to price a listing.
Help them to chose an agent based on rating and number of sales.
For sellers:
Help them to price their home.
See how many people view it online.
Connect them to an agent to help them sell, or let them sell by themselves.
For agents, lenders:
- provide a way to connect with new clients, and to demonstrate their success.
A few years ago Zillow went into rentals and today it’s the leading site in this category in the US.
Here on the bottom right we can see where agents have an opportunity to connect with buyers.
Ten years ago, we were just Zillow, but our brand portfolio grew over time and reflects our mission.
Each brand is striving to empower the consumers through transparency.
Zillow, Trulia and Hotpads focuses on homes and rentals nation wide. StreetEasy and Naked Apartments focus on NYC.
Business brands: Mortgage quotes/rates (Mortech), transaction platform (dotloop)
Huge user base.
30MM rental shopper per month.
First in real estate class - double from our largest competitor (Realtor.com )
78% Market share of all mobile exclusive visitors to real estate category.
In July - Half a billion homes were viewed on Zillow Mobile (270/second) (?????)
Mortgages – 35 million requests in last year
Steven
There are 21 people in the picture. We are actually 48 people now, and have 12 open positions.
Our mission: We attack Zillow’s DS challenges.
Today I’ll talk about the
Start with demo
Zestimate is what made Zillow so famous. It started on day 1, and it what differentiates us from our competitors.
<go over list>
Zillow Home Value index is a economic index derived from the Zestimate. Today it is used by large financial institution, organization and municipalities to understand the real estate market and help decision making. This means that Zestimate is not only helping individuals to value homes, it also help decision makers to understand the housing market.
This is a supervised learning problem. Each home in our dataset, has a set of features associated with it and its sale price. Our goal is to predict the sale price using the features.
David
Netflix page is very personalized and tailored to the user interests.
Each row gives a different way to organize movies.
The first and created by the same model, which gets a collection of movies with a single attributes and rank them according to the user viewing habits.
The second row is from a completely different model the rank similarity between movies.
All these rows are ordered by a third model.
- We would like to simplify the home buying experience and make it as easy a choosing a movie on Netflix.
Each type of recommendation answers different needs.
Email – We would like to send users alerts when their dream home comes on the market, or show them homes that they might wouldn’t consider. The challenge is how not to spam.
When viewing a home, showing other similar homes that the user might like.
When ranking search results, we need to chose the most relevant homes to go to the top of the list.
In recommendation what we usually have is a set of user-item pairs and a corresponding label. The idea is that if we can predict whether a user would like a listing we could make good recommendations.
This seems is a supervised learning problem.
In real life it’s much more complicated.
- How do we know if a user like an item? Most users don’t explicitly tells us. For example, most users don’t rate movies and like videos on youtube.
Even when user tells us, it does not necessary means what we want it to mean. For instance, a user might not like a listing, but it was very relevant for him because at this stage she’s just exploring the market and she would like to understand what she can afford. So listings for homes we will never buy help us understand our options.
The challenge with recommendation is that we never solve for the problem that we would like to solve. We only solve for a surrogate problem. So part of our work is to find the best surrogate problem to solve.
We have a very large catalog.
No of users is on the same order as the number of Items.
No popular items.
Block diagonal matrix
To complicate things, we have features associated with the listings. And we have user activity. How can we translate that to features that are predictive of the outcome.
Shown mission/brands/data – how do we get there
Zillow culture - people
Share people you like
David – ZG is a team sport, turn on the lights (anonymous questions, wikis, open discussion)
Steven –Winning Is Fun – competition, Move Fast Think Big (hackweeks)