SlideShare a Scribd company logo
Computer Vision
in Real Estate
PRESENTED BY VASSIL LUNCHEV
WWW.HOMEHEED.COM
There is a problem in
Sofia’s Real Estate
 Demo 1
Setup
There are many duplicates
We have 600,000 listings
Many listings are fake
Goals
Cluster all listings into Homes
Classify homes as available or
not-available (as of today)
Approaches (for clustering)
1. Location (GPS coordinates)
• Works great for Booking, Airbnb, Expedia
• Only 24% of the listings have GPS coordinates
• Even if a listing has location, it is ”wrong”
2. Texts, numbers and categories
• Price, m2, district, text description, …
• (demo 2)
3. Images
The image based approach
1. Find equal images
• Dataset of 5,000,000 images
• Keypoint matching
2. Given 2 listings, classify equal or non-equal
• 1 listing has about 10 images
• Machine learning classification
3. Is this home available today?
• 2 main signals – history and reputation
• Ground truth dataset
Finding
equal images
 Demo 3
Keypoint matching
1. Detect keypoings
• 500 keypoints per image
2. Describe keypoints
• 256 bits (32 bytes) per keypoint
3. Match keypoints
• Hamming distance < THRESHOLD
• Locality Sensitive Hashing (LSH)
4. Match images
• RANSAC and homography
Keypoint detection
Keypoint detection
Keypoint description
256 bits
1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0
Keypoint matching
Keypoint 1 (256 bits) 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0
Keypoint 2 (256 bits) 1 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0
XOR
=
Result 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
Hamming distance: 2 <= 3 (constant THRESHOLD) => These keypoints are equal
Keypoint matching
I think this keypoint is equal to that keypoint
Image matching
I think the left image is equal to this part of the right image
The image based approach
1. Find equal images
• Dataset of 5,000,000 images
• Keypoint matching
2. Given 2 listings, classify equal or non-equal
• 1 listing has about 10 images
• Machine learning classification
3. Is this home available today?
• 2 main signals – history and reputation
• Ground truth dataset
✓ Presentation
up to now
Given 2 listings, classify equal or
non-equal
 Random forest
 Sources:
 Image matches (each listing has about 10 images)
 Uniqueness of the images
 Numeric data (price, square meters, floor, year, …)
 Category parameters (neighborhood, apartment type, build type, …)
 Text data (bag of words from description)
 Features are differences not absolute values
Is this home available today?
 3 shades of fakeness
• Available means “you can get in that home today”
• Fake means “this home has never existed”
• Outdated means “This home is already rented/sold”
 Classification per day
• Is this available (without a date) can be
both True and False (Schrödinger’s listing)
• A user looking at a snapshot of a listing (just today) misses
most of the information
Disclaimers:
Is this home available today?
2 main signals
• Home history (new and removed listings)
• Lister reputation (how much I trust this guy)
Ground truth dataset
• Manual labeling of auto generated candidates
• The book a showing feature of Homeheed
QUESTIONS…

More Related Content

More from Data Science Society

Air Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi teamAir Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi team
Data Science Society
 
Machine Learning in Astrophysics
Machine Learning in AstrophysicsMachine Learning in Astrophysics
Machine Learning in Astrophysics
Data Science Society
 
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
Data Science Society
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Data Science Society
 
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionDNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
Data Science Society
 
Relationships between research tasks and data structure (basic methods and a...
Relationships between research tasks and data structure (basic  methods and a...Relationships between research tasks and data structure (basic  methods and a...
Relationships between research tasks and data structure (basic methods and a...
Data Science Society
 
Data science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.HaralampievData science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.Haralampiev
Data Science Society
 
Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel
Data Science Society
 
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Data Science Society
 
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav NakovIntelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Data Science Society
 
Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg
Data Science Society
 
Open Data reveals corruption practices - case from Datathon 2017
Open Data reveals corruption practices - case from Datathon 2017Open Data reveals corruption practices - case from Datathon 2017
Open Data reveals corruption practices - case from Datathon 2017
Data Science Society
 
Network Analysis Public Procurement
Network Analysis Public ProcurementNetwork Analysis Public Procurement
Network Analysis Public Procurement
Data Science Society
 
Computer vision and image processing for dental products
Computer vision and image processing for dental productsComputer vision and image processing for dental products
Computer vision and image processing for dental products
Data Science Society
 
Crowdsourced hedge funds
Crowdsourced hedge funds Crowdsourced hedge funds
Crowdsourced hedge funds
Data Science Society
 
Wavelet analysis of financial datasets
Wavelet analysis of financial datasetsWavelet analysis of financial datasets
Wavelet analysis of financial datasets
Data Science Society
 
Real-time analytics with HBase
Real-time analytics with HBaseReal-time analytics with HBase
Real-time analytics with HBase
Data Science Society
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
Data Science Society
 
Real-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataReal-time information analysis: social networks and open data
Real-time information analysis: social networks and open data
Data Science Society
 
Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companies
Data Science Society
 

More from Data Science Society (20)

Air Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi teamAir Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi team
 
Machine Learning in Astrophysics
Machine Learning in AstrophysicsMachine Learning in Astrophysics
Machine Learning in Astrophysics
 
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
 
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionDNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
 
Relationships between research tasks and data structure (basic methods and a...
Relationships between research tasks and data structure (basic  methods and a...Relationships between research tasks and data structure (basic  methods and a...
Relationships between research tasks and data structure (basic methods and a...
 
Data science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.HaralampievData science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.Haralampiev
 
Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel
 
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
 
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav NakovIntelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
 
Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg
 
Open Data reveals corruption practices - case from Datathon 2017
Open Data reveals corruption practices - case from Datathon 2017Open Data reveals corruption practices - case from Datathon 2017
Open Data reveals corruption practices - case from Datathon 2017
 
Network Analysis Public Procurement
Network Analysis Public ProcurementNetwork Analysis Public Procurement
Network Analysis Public Procurement
 
Computer vision and image processing for dental products
Computer vision and image processing for dental productsComputer vision and image processing for dental products
Computer vision and image processing for dental products
 
Crowdsourced hedge funds
Crowdsourced hedge funds Crowdsourced hedge funds
Crowdsourced hedge funds
 
Wavelet analysis of financial datasets
Wavelet analysis of financial datasetsWavelet analysis of financial datasets
Wavelet analysis of financial datasets
 
Real-time analytics with HBase
Real-time analytics with HBaseReal-time analytics with HBase
Real-time analytics with HBase
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Real-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataReal-time information analysis: social networks and open data
Real-time information analysis: social networks and open data
 
Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companies
 

Recently uploaded

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 

Recently uploaded (20)

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 

Computer Vision in Real Estate

  • 1. Computer Vision in Real Estate PRESENTED BY VASSIL LUNCHEV WWW.HOMEHEED.COM
  • 2. There is a problem in Sofia’s Real Estate  Demo 1
  • 3. Setup There are many duplicates We have 600,000 listings Many listings are fake Goals Cluster all listings into Homes Classify homes as available or not-available (as of today)
  • 4. Approaches (for clustering) 1. Location (GPS coordinates) • Works great for Booking, Airbnb, Expedia • Only 24% of the listings have GPS coordinates • Even if a listing has location, it is ”wrong” 2. Texts, numbers and categories • Price, m2, district, text description, … • (demo 2) 3. Images
  • 5. The image based approach 1. Find equal images • Dataset of 5,000,000 images • Keypoint matching 2. Given 2 listings, classify equal or non-equal • 1 listing has about 10 images • Machine learning classification 3. Is this home available today? • 2 main signals – history and reputation • Ground truth dataset
  • 7. Keypoint matching 1. Detect keypoings • 500 keypoints per image 2. Describe keypoints • 256 bits (32 bytes) per keypoint 3. Match keypoints • Hamming distance < THRESHOLD • Locality Sensitive Hashing (LSH) 4. Match images • RANSAC and homography
  • 10. Keypoint description 256 bits 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0
  • 11. Keypoint matching Keypoint 1 (256 bits) 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 Keypoint 2 (256 bits) 1 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 XOR = Result 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 Hamming distance: 2 <= 3 (constant THRESHOLD) => These keypoints are equal
  • 12. Keypoint matching I think this keypoint is equal to that keypoint
  • 13. Image matching I think the left image is equal to this part of the right image
  • 14. The image based approach 1. Find equal images • Dataset of 5,000,000 images • Keypoint matching 2. Given 2 listings, classify equal or non-equal • 1 listing has about 10 images • Machine learning classification 3. Is this home available today? • 2 main signals – history and reputation • Ground truth dataset ✓ Presentation up to now
  • 15. Given 2 listings, classify equal or non-equal  Random forest  Sources:  Image matches (each listing has about 10 images)  Uniqueness of the images  Numeric data (price, square meters, floor, year, …)  Category parameters (neighborhood, apartment type, build type, …)  Text data (bag of words from description)  Features are differences not absolute values
  • 16. Is this home available today?  3 shades of fakeness • Available means “you can get in that home today” • Fake means “this home has never existed” • Outdated means “This home is already rented/sold”  Classification per day • Is this available (without a date) can be both True and False (Schrödinger’s listing) • A user looking at a snapshot of a listing (just today) misses most of the information Disclaimers:
  • 17. Is this home available today? 2 main signals • Home history (new and removed listings) • Lister reputation (how much I trust this guy) Ground truth dataset • Manual labeling of auto generated candidates • The book a showing feature of Homeheed