SlideShare a Scribd company logo
Confidential - donot distribute
Hotels.com’sjourneyto becoming
anAlgorithmicBusiness
Matthew Fryer
VP, Chief Data Science Officer
mfryer@hotels.com
Confidential - donot distribute
Part of Expedia, Inc. family
385,000 properties
89 countries
39 languages
>27m Hotels.com Rewards Members
Home of Captain Obvious
Billions of Recommendations, based on real-time Data per day
Hotels.com
Confidential - donot distribute
Confidential - donot distribute
Confidential - donot distribute
5
Data Science Engineering Front End Development
Confidential - donot distribute
“Artificial Intelligence Will Be
Travel’s Next Big Thing”
Barry Diller
Chairman & Senior Executive,
Expedia, Inc.
3M’s are disruptive
technology
Mobile
Messaging / NLP
Machine Learning
Confidential - donot distribute
Confidential - donot distribute
Our overall ecosystem
Confidential - donot distribute 9
Core Elementsof our Data ScienceCloud Platform
Databricks Unified Platform
Maestro – Our Internally Developed
Platform on AWS
(EMR, Spark, R-Studio, Intellij, SBT, Jupyter,
Zeppelin, Unit / QA, Metastore, Apache Airflow,
Keras, Tensorflow)
Proof of Concept on Google
Cloud, Beam, Spark &
Tensorflow
Confidential - donot distribute
DatabricksUnifiedPlatform
Chart is in1hourblocks, y axis = numberof 32coreinstances
10
• Key asset to the success of data science at Hotels.com
• Key in driving up data scientist productivity / efficiency / flexibility
• Helps make our data science lifecycle operate much easier and
faster driving speed to market
• Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting
cost effective spot instance on AWS.
Confidential - donot distribute
ALPs – AlgorithmLifecyclePipelineService
11
Confidential - donot distribute
Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour
Imagesarean importantfactorwhilechoosinga hotel
12
0% 10% 20% 30% 40% 50% 60% 70% 80%
Loyalty Program
Reviews
Hotel Brand
Star Rating
Destination Info
Images
Hotel Info
Factors other than price/location
Very Imporant/Important Important Very Important
Confidential - donot distribute
ComputerVisionproblemswetry to tackle
13
Near Duplicate Detection
Scene Classification Image Ranking
Confidential - donot distribute 14
Tagged as Bathroom
Confidential - donot distribute 15
GPU’s quickly became key, took a large effort to optimize using
Keras + Tensorflow (Inception v3 + ResNet)
493
67
20
7
4
1
10
100
1000
12-CPU 1-GPU 1-GPU +
limited cache
16-GPU +
limited cache
16-GPU + full
cache
Days CIFAR2
Expedia Small
15
2.5
0
10
20
16-GPU + full cache Optimized
Days
Confidential - donot distribute
NearDuplicateDetection:Realworldexamples
16
Non-Duplicates – probability 100%
Non-Duplicates – probability 95.91%
Duplicates – probability 97.98%
Duplicates – probability 98.43%
Confidential - donot distribute
ROOM/BATHROOM
Usingthe model:Real worldexamples
17
EXTERIOR/HOTEL INTERIOR/SEATING_LO
BBY
ROOM/LIVING_ROOM
ROOM/GUESTROOM
FACILITIES/DINING
INTERIOR/SEATING_LOBBY
FACILITIES/POOL
Confidential - donot distribute
Accuracy& ConfusionMatrix
18
• After many manual / long
winded iterations and
regularization processes
tuning hyperparameters
• We achieved good
accuracy and low
confusion matrix
Confidential - donot distribute
Optimizingthe photo orderfor improvedcustomer
experiences
19
Original Model
Reference: Radisson Blu Edwardian Berkshire Hotel, London
Confidential - donot distribute
Findingthe right hotel in our marketplace is core to
our customers needs.
Confidential - donot distribute
Kensington
Bloomsbury
Heathrow
Canary
Wharf
Paddington
Westminster
London City
Airport
Chelsea
Battersea
Wimbledon
Wembley
City of
London
As an exampledifferentusersegmentsliketo stayin
differentlocations
Confidential - donot distribute 22
Utility
Utility
Utility
just browsing! BOOK!Intent
(click)
Confidential - donot distribute
Thank you
mfryer@hotels.com
https://uk.linkedin.com/in/matthewfryer
@mattfryer

More Related Content

What's hot

Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSABuilding the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Databricks
 
Google Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptxGoogle Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptx
VishPothapu
 

What's hot (20)

Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSABuilding the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
 
Google Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptxGoogle Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptx
 
Ray and Its Growing Ecosystem
Ray and Its Growing EcosystemRay and Its Growing Ecosystem
Ray and Its Growing Ecosystem
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
AI Strategy & Advance Analytics
AI Strategy & Advance AnalyticsAI Strategy & Advance Analytics
AI Strategy & Advance Analytics
 
Behind the scenes data engineering
Behind the scenes   data engineeringBehind the scenes   data engineering
Behind the scenes data engineering
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
 
Top 7 Capabilities for Next-Gen Master Data Management
Top 7 Capabilities for Next-Gen Master Data ManagementTop 7 Capabilities for Next-Gen Master Data Management
Top 7 Capabilities for Next-Gen Master Data Management
 
Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics
 
Applying Network Analytics in KYC
Applying Network Analytics in KYCApplying Network Analytics in KYC
Applying Network Analytics in KYC
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Digital banking on AWS
Digital banking on AWSDigital banking on AWS
Digital banking on AWS
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Machine Learning Project Lifecycle
Machine Learning Project LifecycleMachine Learning Project Lifecycle
Machine Learning Project Lifecycle
 

Similar to Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth in Data Science Whilst Migrating to Spark+Cloud all at the Same Time with Matt Fryer

Securing the Software Defined Car™ Using Artificial Intelligence and OTA Updates
Securing the Software Defined Car™ Using Artificial Intelligence and OTA UpdatesSecuring the Software Defined Car™ Using Artificial Intelligence and OTA Updates
Securing the Software Defined Car™ Using Artificial Intelligence and OTA Updates
Mahbubul Alam
 
Keynote fx try harder 2 be yourself
Keynote fx   try harder 2 be yourselfKeynote fx   try harder 2 be yourself
Keynote fx try harder 2 be yourself
DefconRussia
 
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Keith Kraus
 
Park Inn Business Development Brief
Park  Inn  Business  Development  BriefPark  Inn  Business  Development  Brief
Park Inn Business Development Brief
Ushouldsendit2
 
Park inn business development brief
Park inn business development briefPark inn business development brief
Park inn business development brief
Ushouldsendit2
 

Similar to Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth in Data Science Whilst Migrating to Spark+Cloud all at the Same Time with Matt Fryer (20)

Protecting Data Everywhere - Barracuda
Protecting Data Everywhere - BarracudaProtecting Data Everywhere - Barracuda
Protecting Data Everywhere - Barracuda
 
Machine Learning & Cyber Security: Detecting Malicious URLs in the Haystack
Machine Learning & Cyber Security: Detecting Malicious URLs in the HaystackMachine Learning & Cyber Security: Detecting Malicious URLs in the Haystack
Machine Learning & Cyber Security: Detecting Malicious URLs in the Haystack
 
Skynet Week 8 H4D Stanford 2016
Skynet Week 8 H4D Stanford 2016Skynet Week 8 H4D Stanford 2016
Skynet Week 8 H4D Stanford 2016
 
Securing the Software Defined Car™ Using Artificial Intelligence and OTA Updates
Securing the Software Defined Car™ Using Artificial Intelligence and OTA UpdatesSecuring the Software Defined Car™ Using Artificial Intelligence and OTA Updates
Securing the Software Defined Car™ Using Artificial Intelligence and OTA Updates
 
AI in Finance: Moving forward!
AI in Finance: Moving forward!AI in Finance: Moving forward!
AI in Finance: Moving forward!
 
Cloud Computing in 3-D
Cloud Computing in 3-DCloud Computing in 3-D
Cloud Computing in 3-D
 
New recipes for the ever growing content cloud
New recipes for the ever growing content cloudNew recipes for the ever growing content cloud
New recipes for the ever growing content cloud
 
Keynote fx try harder 2 be yourself
Keynote fx   try harder 2 be yourselfKeynote fx   try harder 2 be yourself
Keynote fx try harder 2 be yourself
 
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
 
Delivering application happiness for you!
Delivering application happiness for you!Delivering application happiness for you!
Delivering application happiness for you!
 
Gartner: Top 10 Technology Trends 2015
Gartner: Top 10 Technology Trends 2015Gartner: Top 10 Technology Trends 2015
Gartner: Top 10 Technology Trends 2015
 
Triangulum - Ransomware Evolved - Why your backups arent good enough
Triangulum - Ransomware Evolved - Why your backups arent good enoughTriangulum - Ransomware Evolved - Why your backups arent good enough
Triangulum - Ransomware Evolved - Why your backups arent good enough
 
#w-cell-struc-security Wardley Maps: Cell Bases structures for Security
#w-cell-struc-security Wardley Maps: Cell Bases structures for Security#w-cell-struc-security Wardley Maps: Cell Bases structures for Security
#w-cell-struc-security Wardley Maps: Cell Bases structures for Security
 
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
 
2012: The End of the World?
2012: The End of the World?2012: The End of the World?
2012: The End of the World?
 
EMEA10: Trepidation in Moving to the Cloud
EMEA10: Trepidation in Moving to the CloudEMEA10: Trepidation in Moving to the Cloud
EMEA10: Trepidation in Moving to the Cloud
 
DVX: Data Visualization Experiences
DVX: Data Visualization ExperiencesDVX: Data Visualization Experiences
DVX: Data Visualization Experiences
 
Park Inn Business Development Brief
Park  Inn  Business  Development  BriefPark  Inn  Business  Development  Brief
Park Inn Business Development Brief
 
Park inn business development brief
Park inn business development briefPark inn business development brief
Park inn business development brief
 
Windows Azure PaaS - Webinar Common Sense
Windows Azure PaaS - Webinar Common SenseWindows Azure PaaS - Webinar Common Sense
Windows Azure PaaS - Webinar Common Sense
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
benishzehra469
 

Recently uploaded (20)

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 

Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth in Data Science Whilst Migrating to Spark+Cloud all at the Same Time with Matt Fryer

  • 1. Confidential - donot distribute Hotels.com’sjourneyto becoming anAlgorithmicBusiness Matthew Fryer VP, Chief Data Science Officer mfryer@hotels.com
  • 2. Confidential - donot distribute Part of Expedia, Inc. family 385,000 properties 89 countries 39 languages >27m Hotels.com Rewards Members Home of Captain Obvious Billions of Recommendations, based on real-time Data per day Hotels.com
  • 3. Confidential - donot distribute
  • 4. Confidential - donot distribute
  • 5. Confidential - donot distribute 5 Data Science Engineering Front End Development
  • 6. Confidential - donot distribute “Artificial Intelligence Will Be Travel’s Next Big Thing” Barry Diller Chairman & Senior Executive, Expedia, Inc. 3M’s are disruptive technology Mobile Messaging / NLP Machine Learning
  • 7. Confidential - donot distribute
  • 8. Confidential - donot distribute Our overall ecosystem
  • 9. Confidential - donot distribute 9 Core Elementsof our Data ScienceCloud Platform Databricks Unified Platform Maestro – Our Internally Developed Platform on AWS (EMR, Spark, R-Studio, Intellij, SBT, Jupyter, Zeppelin, Unit / QA, Metastore, Apache Airflow, Keras, Tensorflow) Proof of Concept on Google Cloud, Beam, Spark & Tensorflow
  • 10. Confidential - donot distribute DatabricksUnifiedPlatform Chart is in1hourblocks, y axis = numberof 32coreinstances 10 • Key asset to the success of data science at Hotels.com • Key in driving up data scientist productivity / efficiency / flexibility • Helps make our data science lifecycle operate much easier and faster driving speed to market • Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting cost effective spot instance on AWS.
  • 11. Confidential - donot distribute ALPs – AlgorithmLifecyclePipelineService 11
  • 12. Confidential - donot distribute Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour Imagesarean importantfactorwhilechoosinga hotel 12 0% 10% 20% 30% 40% 50% 60% 70% 80% Loyalty Program Reviews Hotel Brand Star Rating Destination Info Images Hotel Info Factors other than price/location Very Imporant/Important Important Very Important
  • 13. Confidential - donot distribute ComputerVisionproblemswetry to tackle 13 Near Duplicate Detection Scene Classification Image Ranking
  • 14. Confidential - donot distribute 14 Tagged as Bathroom
  • 15. Confidential - donot distribute 15 GPU’s quickly became key, took a large effort to optimize using Keras + Tensorflow (Inception v3 + ResNet) 493 67 20 7 4 1 10 100 1000 12-CPU 1-GPU 1-GPU + limited cache 16-GPU + limited cache 16-GPU + full cache Days CIFAR2 Expedia Small 15 2.5 0 10 20 16-GPU + full cache Optimized Days
  • 16. Confidential - donot distribute NearDuplicateDetection:Realworldexamples 16 Non-Duplicates – probability 100% Non-Duplicates – probability 95.91% Duplicates – probability 97.98% Duplicates – probability 98.43%
  • 17. Confidential - donot distribute ROOM/BATHROOM Usingthe model:Real worldexamples 17 EXTERIOR/HOTEL INTERIOR/SEATING_LO BBY ROOM/LIVING_ROOM ROOM/GUESTROOM FACILITIES/DINING INTERIOR/SEATING_LOBBY FACILITIES/POOL
  • 18. Confidential - donot distribute Accuracy& ConfusionMatrix 18 • After many manual / long winded iterations and regularization processes tuning hyperparameters • We achieved good accuracy and low confusion matrix
  • 19. Confidential - donot distribute Optimizingthe photo orderfor improvedcustomer experiences 19 Original Model Reference: Radisson Blu Edwardian Berkshire Hotel, London
  • 20. Confidential - donot distribute Findingthe right hotel in our marketplace is core to our customers needs.
  • 21. Confidential - donot distribute Kensington Bloomsbury Heathrow Canary Wharf Paddington Westminster London City Airport Chelsea Battersea Wimbledon Wembley City of London As an exampledifferentusersegmentsliketo stayin differentlocations
  • 22. Confidential - donot distribute 22 Utility Utility Utility just browsing! BOOK!Intent (click)
  • 23. Confidential - donot distribute Thank you mfryer@hotels.com https://uk.linkedin.com/in/matthewfryer @mattfryer