SlideShare a Scribd company logo
1 of 23
Download to read offline
1st edition
November 4-5, 2018
Machine Learning School in Doha
BigML, Inc X
Real-world Use Case
House Recommender
Full Name
Role, Company
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Let’s build a recommender
Typical way to shop for a home…
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Recommender Idea
?
?
?
?
Preference
Model
Preference
Data
Sample
… then use the Preference Model to
filter all the homes on the market
All Homes
Forsale
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Recommender Problem #1
What if there are really unusual homes in the data?
• A mansion with 20 bathrooms
• A home with no bedrooms
• A lot size that is smaller than the home?
We don’t want to show these as suggestions
because they are unusual…. How do we detect
anomalies?
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Anomaly Detection
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
What just happened?
• We wanted to find and remove unusual houses.
• We created an Anomaly Detector and examined
the top anomalies.
• We found some unusual houses to remove and
discovered bad data (missing values) that we want
to fix.
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
ML to fix missing data…
• Let’s use Machine Learning…
BEDS BATHSWhat happened to 

being easy?
SQFT PRICE BEDS BATHS
3.125 US$530.000 5 3
2.100 US$460.000 2
1.200 US$250.000 3
3.950 US$610.000 6 4
4
1.5
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
WhizzML
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
What just happened?
• We had a Dataset with missing values.
• We wanted to apply an algorithm to fix the missing
values with Machine Learning
• Rather than write the algorithm, we found what we
needed in the WhizzML public gallery.
• Now that we have cloned the Script we can use it
again and again.
• We can write new ones too!
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Recommender Problem #2
• How can we avoid showing essentially the
same house over and over?
All Homes
?
?
?
Sample
Modern
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Recommender Problem #2
• How can we avoid showing essentially the
same house over and over?
All Homes
Modern
Lots of
Land
• Great! What if we don’t know how to group
them? Or how many groups?
?
sample
?
sample
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Clustering
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
What just happened?
• Since we don’t know how many groups of homes
there should be, we used G-means Clustering to find
the optimum number of groups of homes
• Our recommender will use these groups to create a
better sampling for user preference
• We also tried to understand the home Clusters using
“model clusters” but the models were difficult to
interpret
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Understanding Clusters
If SQFT >= 3,125 THEN “Cluster 1”
What if we could get rules like…
SQFT PRICE BEDS BATHS CLUSTER
3.125 US$530.000 5 3 Cluster 1
2.100 US$460.000 4 2 Cluster 3
1.200 US$250.000 3 1,5 Cluster 5
3.950 US$610.000 6 4 Cluster 1
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Association Discovery
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
What just happened?
• We used a Batch Centroid to add the Cluster
assignment of each home as a feature to the Dataset
• We use Association Discovery to find “interesting”
relationships between the features including the
Cluster assignment
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Recommender Problem #3
There is much more interesting information than just the
number of BEDS, BATHS, etc.
• Unfortunately, these "remarks" are not available in the
Redfin download
• Adding them to our dataset requires crawling the
website
• Like most ML projects, preparing the data is 80% of
the difficulty (fortunately I already did it!)
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Topic Modeling
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
What just happened?
• We extending the home dataset with the syndicated
remarks text field
• We built a Model to predict sale price and explored how
key words discovered in the remarks impacted price
• We used Topic Modeling to create a deeper thematic
understanding of the remarks
• Homes that are "in-town" or "out-of-town"
• We extended the Dataset with fields that represent for
each home how related they are to each of these topics
• This will allow our Clustering to group homes by a deeper
meaning than just BEDS, BATHS, etc
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
Recommender Idea
?
?
Modern
Lots of
Land
Small
?
?
?
?
Preference
Model
Preference
Data
BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 ·
House Recommender
MLSD18. Real World Use Case II

More Related Content

What's hot

MLSD18. Data Cleaning
MLSD18. Data CleaningMLSD18. Data Cleaning
MLSD18. Data CleaningBigML, Inc
 
MLSD18 Evaluations
MLSD18 EvaluationsMLSD18 Evaluations
MLSD18 EvaluationsBigML, Inc
 
MLSD18. Supervised Workshop
MLSD18. Supervised WorkshopMLSD18. Supervised Workshop
MLSD18. Supervised WorkshopBigML, Inc
 
MLSD18. Summary of Morning Sessions
MLSD18. Summary of Morning SessionsMLSD18. Summary of Morning Sessions
MLSD18. Summary of Morning SessionsBigML, Inc
 
MLSD18. Basic Transformations - QCRI
MLSD18. Basic Transformations - QCRIMLSD18. Basic Transformations - QCRI
MLSD18. Basic Transformations - QCRIBigML, Inc
 
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature EngineeringBigML, Inc
 
MLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsMLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsBigML, Inc
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML, Inc
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBigML, Inc
 
VSSML18. Feature Engineering
VSSML18. Feature EngineeringVSSML18. Feature Engineering
VSSML18. Feature EngineeringBigML, Inc
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsBigML, Inc
 
BigML Release: PCA
BigML Release: PCABigML Release: PCA
BigML Release: PCABigML, Inc
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetGraph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetTigerGraph
 
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraphFROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraphTigerGraph
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBigML, Inc
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering BigML, Inc
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBigML, Inc
 
VSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringVSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringBigML, Inc
 
MLSEV. Use Case: The All-in-One Data Warehouse and Machine Learning
MLSEV. Use Case: The All-in-One Data Warehouse and Machine LearningMLSEV. Use Case: The All-in-One Data Warehouse and Machine Learning
MLSEV. Use Case: The All-in-One Data Warehouse and Machine LearningBigML, Inc
 

What's hot (20)

MLSD18. Data Cleaning
MLSD18. Data CleaningMLSD18. Data Cleaning
MLSD18. Data Cleaning
 
MLSD18 Evaluations
MLSD18 EvaluationsMLSD18 Evaluations
MLSD18 Evaluations
 
MLSD18. Supervised Workshop
MLSD18. Supervised WorkshopMLSD18. Supervised Workshop
MLSD18. Supervised Workshop
 
MLSD18. Summary of Morning Sessions
MLSD18. Summary of Morning SessionsMLSD18. Summary of Morning Sessions
MLSD18. Summary of Morning Sessions
 
MLSD18. Basic Transformations - QCRI
MLSD18. Basic Transformations - QCRIMLSD18. Basic Transformations - QCRI
MLSD18. Basic Transformations - QCRI
 
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature Engineering
 
MLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsMLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning Workflows
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 Release
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzML
 
VSSML18. Feature Engineering
VSSML18. Feature EngineeringVSSML18. Feature Engineering
VSSML18. Feature Engineering
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
 
BigML Release: PCA
BigML Release: PCABigML Release: PCA
BigML Release: PCA
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetGraph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
 
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraphFROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 Sessions
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
 
TigerGraph.js
TigerGraph.jsTigerGraph.js
TigerGraph.js
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature Engineering
 
VSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringVSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature Engineering
 
MLSEV. Use Case: The All-in-One Data Warehouse and Machine Learning
MLSEV. Use Case: The All-in-One Data Warehouse and Machine LearningMLSEV. Use Case: The All-in-One Data Warehouse and Machine Learning
MLSEV. Use Case: The All-in-One Data Warehouse and Machine Learning
 

Similar to MLSD18. Real World Use Case II

VSSML18. Advanced WhizzML Workflows
VSSML18. Advanced WhizzML WorkflowsVSSML18. Advanced WhizzML Workflows
VSSML18. Advanced WhizzML WorkflowsBigML, Inc
 
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformDutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformBigML, Inc
 
BigMLSchool: My First End-to-End Machine Learning Project
BigMLSchool: My First End-to-End Machine Learning ProjectBigMLSchool: My First End-to-End Machine Learning Project
BigMLSchool: My First End-to-End Machine Learning ProjectBigML, Inc
 
DutchMLSchool. Your first BigML Project
DutchMLSchool. Your first BigML ProjectDutchMLSchool. Your first BigML Project
DutchMLSchool. Your first BigML ProjectBigML, Inc
 
DutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveDutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
Digital transformation
Digital transformationDigital transformation
Digital transformationScopernia
 
DutchMLSchool. Supervised vs Unsupervised Learning
DutchMLSchool. Supervised vs Unsupervised LearningDutchMLSchool. Supervised vs Unsupervised Learning
DutchMLSchool. Supervised vs Unsupervised LearningBigML, Inc
 
20171023 5 Lifehacks: How to Analyze a Pack of Websites
20171023 5 Lifehacks: How to Analyze a Pack of Websites20171023 5 Lifehacks: How to Analyze a Pack of Websites
20171023 5 Lifehacks: How to Analyze a Pack of WebsitesMichaela Linhart
 
DutchMLSchool. Machine Learning: Why Now?
DutchMLSchool. Machine Learning: Why Now? DutchMLSchool. Machine Learning: Why Now?
DutchMLSchool. Machine Learning: Why Now? BigML, Inc
 
Artificial Intelligence in Real Estate - 3 Ways AI can Drive Savings
Artificial Intelligence in Real Estate - 3 Ways AI can Drive SavingsArtificial Intelligence in Real Estate - 3 Ways AI can Drive Savings
Artificial Intelligence in Real Estate - 3 Ways AI can Drive SavingsDaniel Faggella
 
2013 - Yhat - YC app.pdf
2013 - Yhat - YC app.pdf2013 - Yhat - YC app.pdf
2013 - Yhat - YC app.pdfAustin Ogilvie
 
DutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesBigML, Inc
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
House price prediction
House price predictionHouse price prediction
House price predictionKaranseth30
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...Dataconomy Media
 
Jared Smith - MeasureCamp Austin 2020 Presentation
Jared Smith - MeasureCamp Austin 2020 PresentationJared Smith - MeasureCamp Austin 2020 Presentation
Jared Smith - MeasureCamp Austin 2020 PresentationJaredSmith181
 
How Digital Marketer Uses Google Tag Manager To Improve Sales & Leads
How Digital Marketer Uses Google Tag Manager To Improve Sales & LeadsHow Digital Marketer Uses Google Tag Manager To Improve Sales & Leads
How Digital Marketer Uses Google Tag Manager To Improve Sales & LeadsMeasurementMarketing.io
 

Similar to MLSD18. Real World Use Case II (20)

VSSML18. Advanced WhizzML Workflows
VSSML18. Advanced WhizzML WorkflowsVSSML18. Advanced WhizzML Workflows
VSSML18. Advanced WhizzML Workflows
 
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformDutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
 
BigMLSchool: My First End-to-End Machine Learning Project
BigMLSchool: My First End-to-End Machine Learning ProjectBigMLSchool: My First End-to-End Machine Learning Project
BigMLSchool: My First End-to-End Machine Learning Project
 
DutchMLSchool. Your first BigML Project
DutchMLSchool. Your first BigML ProjectDutchMLSchool. Your first BigML Project
DutchMLSchool. Your first BigML Project
 
DutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveDutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business Perspective
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
Digital transformation
Digital transformationDigital transformation
Digital transformation
 
DutchMLSchool. Supervised vs Unsupervised Learning
DutchMLSchool. Supervised vs Unsupervised LearningDutchMLSchool. Supervised vs Unsupervised Learning
DutchMLSchool. Supervised vs Unsupervised Learning
 
20171023 5 Lifehacks: How to Analyze a Pack of Websites
20171023 5 Lifehacks: How to Analyze a Pack of Websites20171023 5 Lifehacks: How to Analyze a Pack of Websites
20171023 5 Lifehacks: How to Analyze a Pack of Websites
 
Everwrite pitch deck
Everwrite pitch deckEverwrite pitch deck
Everwrite pitch deck
 
DutchMLSchool. Machine Learning: Why Now?
DutchMLSchool. Machine Learning: Why Now? DutchMLSchool. Machine Learning: Why Now?
DutchMLSchool. Machine Learning: Why Now?
 
Artificial Intelligence in Real Estate - 3 Ways AI can Drive Savings
Artificial Intelligence in Real Estate - 3 Ways AI can Drive SavingsArtificial Intelligence in Real Estate - 3 Ways AI can Drive Savings
Artificial Intelligence in Real Estate - 3 Ways AI can Drive Savings
 
2013 - Yhat - YC app.pdf
2013 - Yhat - YC app.pdf2013 - Yhat - YC app.pdf
2013 - Yhat - YC app.pdf
 
DutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time Series
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...
 
Jared Smith - MeasureCamp Austin 2020 Presentation
Jared Smith - MeasureCamp Austin 2020 PresentationJared Smith - MeasureCamp Austin 2020 Presentation
Jared Smith - MeasureCamp Austin 2020 Presentation
 
Generative AI: The New Wild West of SEO - Ryan Huser, Ayima
Generative AI: The New Wild West of SEO - Ryan Huser, AyimaGenerative AI: The New Wild West of SEO - Ryan Huser, Ayima
Generative AI: The New Wild West of SEO - Ryan Huser, Ayima
 
How Digital Marketer Uses Google Tag Manager To Improve Sales & Leads
How Digital Marketer Uses Google Tag Manager To Improve Sales & LeadsHow Digital Marketer Uses Google Tag Manager To Improve Sales & Leads
How Digital Marketer Uses Google Tag Manager To Improve Sales & Leads
 

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
 

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 

Recently uploaded

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

MLSD18. Real World Use Case II

  • 1. 1st edition November 4-5, 2018 Machine Learning School in Doha
  • 2. BigML, Inc X Real-world Use Case House Recommender Full Name Role, Company
  • 3. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Let’s build a recommender Typical way to shop for a home…
  • 4. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Recommender Idea ? ? ? ? Preference Model Preference Data Sample … then use the Preference Model to filter all the homes on the market All Homes Forsale
  • 5. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Recommender Problem #1 What if there are really unusual homes in the data? • A mansion with 20 bathrooms • A home with no bedrooms • A lot size that is smaller than the home? We don’t want to show these as suggestions because they are unusual…. How do we detect anomalies?
  • 6. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Anomaly Detection
  • 7. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · What just happened? • We wanted to find and remove unusual houses. • We created an Anomaly Detector and examined the top anomalies. • We found some unusual houses to remove and discovered bad data (missing values) that we want to fix.
  • 8. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · ML to fix missing data… • Let’s use Machine Learning… BEDS BATHSWhat happened to being easy? SQFT PRICE BEDS BATHS 3.125 US$530.000 5 3 2.100 US$460.000 2 1.200 US$250.000 3 3.950 US$610.000 6 4 4 1.5
  • 9. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · WhizzML
  • 10. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · What just happened? • We had a Dataset with missing values. • We wanted to apply an algorithm to fix the missing values with Machine Learning • Rather than write the algorithm, we found what we needed in the WhizzML public gallery. • Now that we have cloned the Script we can use it again and again. • We can write new ones too!
  • 11. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Recommender Problem #2 • How can we avoid showing essentially the same house over and over? All Homes ? ? ? Sample Modern
  • 12. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Recommender Problem #2 • How can we avoid showing essentially the same house over and over? All Homes Modern Lots of Land • Great! What if we don’t know how to group them? Or how many groups? ? sample ? sample
  • 13. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Clustering
  • 14. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · What just happened? • Since we don’t know how many groups of homes there should be, we used G-means Clustering to find the optimum number of groups of homes • Our recommender will use these groups to create a better sampling for user preference • We also tried to understand the home Clusters using “model clusters” but the models were difficult to interpret
  • 15. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Understanding Clusters If SQFT >= 3,125 THEN “Cluster 1” What if we could get rules like… SQFT PRICE BEDS BATHS CLUSTER 3.125 US$530.000 5 3 Cluster 1 2.100 US$460.000 4 2 Cluster 3 1.200 US$250.000 3 1,5 Cluster 5 3.950 US$610.000 6 4 Cluster 1
  • 16. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Association Discovery
  • 17. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · What just happened? • We used a Batch Centroid to add the Cluster assignment of each home as a feature to the Dataset • We use Association Discovery to find “interesting” relationships between the features including the Cluster assignment
  • 18. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Recommender Problem #3 There is much more interesting information than just the number of BEDS, BATHS, etc. • Unfortunately, these "remarks" are not available in the Redfin download • Adding them to our dataset requires crawling the website • Like most ML projects, preparing the data is 80% of the difficulty (fortunately I already did it!)
  • 19. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Topic Modeling
  • 20. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · What just happened? • We extending the home dataset with the syndicated remarks text field • We built a Model to predict sale price and explored how key words discovered in the remarks impacted price • We used Topic Modeling to create a deeper thematic understanding of the remarks • Homes that are "in-town" or "out-of-town" • We extended the Dataset with fields that represent for each home how related they are to each of these topics • This will allow our Clustering to group homes by a deeper meaning than just BEDS, BATHS, etc
  • 21. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · Recommender Idea ? ? Modern Lots of Land Small ? ? ? ? Preference Model Preference Data
  • 22. BigML, Inc X· @bigmlcom · @QatarComputing · #MLSD18 · House Recommender