Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Confidential - donot distribute
Hotels.com’sjourneyto becoming
anAlgorithmicBusiness
Matthew Fryer
VP, Chief Data Science ...
Confidential - donot distribute
Part of Expedia, Inc. family
385,000 properties
89 countries
39 languages
>27m Hotels.com ...
Confidential - donot distribute
Confidential - donot distribute
Confidential - donot distribute
5
Data Science Engineering Front End Development
Confidential - donot distribute
“Artificial Intelligence Will Be
Travel’s Next Big Thing”
Barry Diller
Chairman & Senior E...
Confidential - donot distribute
Confidential - donot distribute
Our overall ecosystem
Confidential - donot distribute 9
Core Elementsof our Data ScienceCloud Platform
Databricks Unified Platform
Maestro – Our...
Confidential - donot distribute
DatabricksUnifiedPlatform
Chart is in1hourblocks, y axis = numberof 32coreinstances
10
• K...
Confidential - donot distribute
ALPs – AlgorithmLifecyclePipelineService
11
Confidential - donot distribute
Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour
Imagesa...
Confidential - donot distribute
ComputerVisionproblemswetry to tackle
13
Near Duplicate Detection
Scene Classification Ima...
Confidential - donot distribute 14
Tagged as Bathroom
Confidential - donot distribute 15
GPU’s quickly became key, took a large effort to optimize using
Keras + Tensorflow (Inc...
Confidential - donot distribute
NearDuplicateDetection:Realworldexamples
16
Non-Duplicates – probability 100%
Non-Duplicat...
Confidential - donot distribute
ROOM/BATHROOM
Usingthe model:Real worldexamples
17
EXTERIOR/HOTEL INTERIOR/SEATING_LO
BBY
...
Confidential - donot distribute
Accuracy& ConfusionMatrix
18
• After many manual / long
winded iterations and
regularizati...
Confidential - donot distribute
Optimizingthe photo orderfor improvedcustomer
experiences
19
Original Model
Reference: Rad...
Confidential - donot distribute
Findingthe right hotel in our marketplace is core to
our customers needs.
Confidential - donot distribute
Kensington
Bloomsbury
Heathrow
Canary
Wharf
Paddington
Westminster
London City
Airport
Che...
Confidential - donot distribute 22
Utility
Utility
Utility
just browsing! BOOK!Intent
(click)
Confidential - donot distribute
Thank you
mfryer@hotels.com
https://uk.linkedin.com/in/matthewfryer
@mattfryer
Upcoming SlideShare
Loading in …5
×

Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth in Data Science Whilst Migrating to Spark+Cloud all at the Same Time with Matt Fryer

1,315 views

Published on

In the last year Hotels.com has begun it’s journey to becoming an algorithmic business. Matt will talk about their experiences of exponential growth in Data Science Algorithms whilst at the same time the team have migrated to using Spark as their core underlying architecture from SAS / SQL, migrated to the cloud from on-premise are transforming the capability of the data science function. He will also highlight the key enablers that have made this successful including CEO support, the internal concepts of organic intelligence and how Databricks has helped make this happen. He will also highlight the pitfalls on the journey.

Published in: Data & Analytics
  • Be the first to comment

Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth in Data Science Whilst Migrating to Spark+Cloud all at the Same Time with Matt Fryer

  1. 1. Confidential - donot distribute Hotels.com’sjourneyto becoming anAlgorithmicBusiness Matthew Fryer VP, Chief Data Science Officer mfryer@hotels.com
  2. 2. Confidential - donot distribute Part of Expedia, Inc. family 385,000 properties 89 countries 39 languages >27m Hotels.com Rewards Members Home of Captain Obvious Billions of Recommendations, based on real-time Data per day Hotels.com
  3. 3. Confidential - donot distribute
  4. 4. Confidential - donot distribute
  5. 5. Confidential - donot distribute 5 Data Science Engineering Front End Development
  6. 6. Confidential - donot distribute “Artificial Intelligence Will Be Travel’s Next Big Thing” Barry Diller Chairman & Senior Executive, Expedia, Inc. 3M’s are disruptive technology Mobile Messaging / NLP Machine Learning
  7. 7. Confidential - donot distribute
  8. 8. Confidential - donot distribute Our overall ecosystem
  9. 9. Confidential - donot distribute 9 Core Elementsof our Data ScienceCloud Platform Databricks Unified Platform Maestro – Our Internally Developed Platform on AWS (EMR, Spark, R-Studio, Intellij, SBT, Jupyter, Zeppelin, Unit / QA, Metastore, Apache Airflow, Keras, Tensorflow) Proof of Concept on Google Cloud, Beam, Spark & Tensorflow
  10. 10. Confidential - donot distribute DatabricksUnifiedPlatform Chart is in1hourblocks, y axis = numberof 32coreinstances 10 • Key asset to the success of data science at Hotels.com • Key in driving up data scientist productivity / efficiency / flexibility • Helps make our data science lifecycle operate much easier and faster driving speed to market • Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting cost effective spot instance on AWS.
  11. 11. Confidential - donot distribute ALPs – AlgorithmLifecyclePipelineService 11
  12. 12. Confidential - donot distribute Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour Imagesarean importantfactorwhilechoosinga hotel 12 0% 10% 20% 30% 40% 50% 60% 70% 80% Loyalty Program Reviews Hotel Brand Star Rating Destination Info Images Hotel Info Factors other than price/location Very Imporant/Important Important Very Important
  13. 13. Confidential - donot distribute ComputerVisionproblemswetry to tackle 13 Near Duplicate Detection Scene Classification Image Ranking
  14. 14. Confidential - donot distribute 14 Tagged as Bathroom
  15. 15. Confidential - donot distribute 15 GPU’s quickly became key, took a large effort to optimize using Keras + Tensorflow (Inception v3 + ResNet) 493 67 20 7 4 1 10 100 1000 12-CPU 1-GPU 1-GPU + limited cache 16-GPU + limited cache 16-GPU + full cache Days CIFAR2 Expedia Small 15 2.5 0 10 20 16-GPU + full cache Optimized Days
  16. 16. Confidential - donot distribute NearDuplicateDetection:Realworldexamples 16 Non-Duplicates – probability 100% Non-Duplicates – probability 95.91% Duplicates – probability 97.98% Duplicates – probability 98.43%
  17. 17. Confidential - donot distribute ROOM/BATHROOM Usingthe model:Real worldexamples 17 EXTERIOR/HOTEL INTERIOR/SEATING_LO BBY ROOM/LIVING_ROOM ROOM/GUESTROOM FACILITIES/DINING INTERIOR/SEATING_LOBBY FACILITIES/POOL
  18. 18. Confidential - donot distribute Accuracy& ConfusionMatrix 18 • After many manual / long winded iterations and regularization processes tuning hyperparameters • We achieved good accuracy and low confusion matrix
  19. 19. Confidential - donot distribute Optimizingthe photo orderfor improvedcustomer experiences 19 Original Model Reference: Radisson Blu Edwardian Berkshire Hotel, London
  20. 20. Confidential - donot distribute Findingthe right hotel in our marketplace is core to our customers needs.
  21. 21. Confidential - donot distribute Kensington Bloomsbury Heathrow Canary Wharf Paddington Westminster London City Airport Chelsea Battersea Wimbledon Wembley City of London As an exampledifferentusersegmentsliketo stayin differentlocations
  22. 22. Confidential - donot distribute 22 Utility Utility Utility just browsing! BOOK!Intent (click)
  23. 23. Confidential - donot distribute Thank you mfryer@hotels.com https://uk.linkedin.com/in/matthewfryer @mattfryer

×