Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

L9. Real World Machine Learning - Cooking Predictions

1,579 views

Published on

Valencian Summer School 2015
Day 1
Lecture 9
Real World Machine Learning - Cooking Predictions
Andrés González (CleverTask)
https://bigml.com/events/valencian-summer-school-in-machine-learning-2015

Published in: Data & Analytics
  • The final result was amazing, and I highly recommend ⇒ www.HelpWriting.net ⇐ to anyone in the same mindset as me.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • My friend sent me a link to to tis site. This awesome company. They wrote my entire research paper for me, and it turned out brilliantly. I highly recommend this service to anyone in my shoes. ⇒ www.HelpWriting.net ⇐.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

L9. Real World Machine Learning - Cooking Predictions

  1. 1. Cooking Predictions A real case in the hotel sector Andrés González Big Data Prediction Manager andresg@clevertask.com Twitter: @data_lytics
  2. 2. CleverTask Solutions SL - Big Data Business Unit 3 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  3. 3. CleverTask Solutions SL - Big Data Business Unit 4 Hotel Sector • % room occupation. • Cancellation risk. • Income.
  4. 4. CleverTask Solutions SL - Big Data Business Unit 5 Business Need Predict client’s NATIONALITY BEFORE client check-in
  5. 5. CleverTask Solutions SL - Big Data Business Unit 6 Staff Arrangement Languages
  6. 6. CleverTask Solutions SL - Big Data Business Unit 7 Prepare Activities
  7. 7. CleverTask Solutions SL - Big Data Business Unit 8 Kitchen Arrangement
  8. 8. CleverTask Solutions SL - Big Data Business Unit 9 Customize Stay
  9. 9. CleverTask Solutions SL - Big Data Business Unit 10 … Details Make the Difference In short, because…
  10. 10. CleverTask Solutions SL - Big Data Business Unit 11 Machine Learning basics
  11. 11. CleverTask Solutions SL - Big Data Business Unit 12 Machine Learning basics Can you find patterns in this data?
  12. 12. CleverTask Solutions SL - Big Data Business Unit 13 Machine Learning basics Historical Data Training Prediction New Data Re-Training
  13. 13. CleverTask Solutions SL - Big Data Business Unit 14 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  14. 14. CleverTask Solutions SL - Big Data Business Unit Tasting the Dish Cooking Transforming 15 “Cooking” Predictions2 Go to the market to buy ingredients Cleaning
  15. 15. CleverTask Solutions SL - Big Data Business Unit Evaluating Prediction Quality Training the Model Transforming and Feature Engineering 15 “Cooking” Predictions2 Gathering RAW data Cleaning Data
  16. 16. CleverTask Solutions SL - Big Data Business Unit 16 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  17. 17. CleverTask Solutions SL - Big Data Business Unit 17 Where does Data come from? Own Website Partners Websites RAW Data
  18. 18. CleverTask Solutions SL - Big Data Business Unit 18 RAW Data One year historical reservation data (.xlsx file) Characteristics •260.000 reservations •80 fields •57 categorical •9 numeric •10 date •3 text •1 incorrect field •Size: 150 MB
  19. 19. CleverTask Solutions SL - Big Data Business Unit 19 RAW Data
  20. 20. CleverTask Solutions SL - Big Data Business Unit 20 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  21. 21. CleverTask Solutions SL - Big Data Business Unit “Dirty” RAW Data Gathering Data 21 The Process New Fields 1 3 4 Transformation and Feature Engineering “Clean” Data Calculated Fields 2 Cleaning Model
  22. 22. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  23. 23. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  24. 24. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  25. 25. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  26. 26. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  27. 27. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  28. 28. CleverTask Solutions SL - Big Data Business Unit 23 Data Cleaning Row Deletion • Reservations without check-in • Cancelled reservations • Rows with errors Column Deletion • IDs vs names • Columns with little data Other Actions • Give dates a format • Delete accents • Transform .xlsx -> .csv
  29. 29. CleverTask Solutions SL - Big Data Business Unit 24 Clean Dataset Clean •150.000 reservations •46 fields •26 categorical •9 numeric •10 data •1 text •Size: 75MB Dirty •260.000 reservations •80 fields •57 categorical •9 numeric •10 data •3 text •1 incorrect field •Size: 150 MB
  30. 30. CleverTask Solutions SL - Big Data Business Unit “Dirty” RAW Data Gathering Data 25 The Process New Fields 1 3 4 Transformations and Feature Engineering “Clean” Data Calculated Fields 2 Cleaning Model
  31. 31. CleverTask Solutions SL - Big Data Business Unit 26 Transformations Country Grouping •A lot of countries to predict (210) •Some countries have very few instances •Grouping objective: mín. 1% of total instances • Does not affect business objective •Total number of groups: 20 New Fields • RESERV_ANTICIPATION (calculated): (reservation date - checkin date) • COUNTRY_HOTEL (name of the country) • HOTEL_STARS (1-5)
  32. 32. CleverTask Solutions SL - Big Data Business Unit 27 Clean Dataset Clean •150.000 reservations •46 fields •Size: 75MB Dirty •260.000 reservations •80 fields •Size: 150 MB Transformed •150.000 registers •49 fields •Size: 80MB
  33. 33. CleverTask Solutions SL - Big Data Business Unit 28 What is Feature Engineering Extract signal from noise
  34. 34. CleverTask Solutions SL - Big Data Business Unit 29 Feature Engineering Techniques • Detecta fields (features) that are predictorss (signal) and bypass those that are not (noise) • Dependand fields (pax, days, pax*days) • Needless fields (reservation number) • Fields with very little data • Random fields (minute and second of reservation) • Domain knowledge • Experience • Recursive cycle
  35. 35. CleverTask Solutions SL - Big Data Business Unit 30 Field Selection Algorithm Adjustment Prediction Quality Evaluation Recursive Feature Engineering
  36. 36. CleverTask Solutions SL - Big Data Business Unit 31 Clean Dataset Clean •150.000 reservations •46 fields •Size: 75MB Dirty •260.000 reservations •80 fields •Size: 150 MB Transformed •150.000 registers •49 fields •Size: 80MB Final Dataset •150.000 registers •10 fields •Size: 55MB
  37. 37. CleverTask Solutions SL - Big Data Business Unit 32 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  38. 38. CleverTask Solutions SL - Big Data Business Unit 33 The Process “Dirty” RAW Data New Fields 1 3 4 Gathering Data Transformation and Feature Engineering “Clean” Data Calculated 2 Cleaning Modeling
  39. 39. CleverTask Solutions SL - Big Data Business Unit 34 Modeling Training Learning
  40. 40. CleverTask Solutions SL - Big Data Business Unit 35 Modeling
  41. 41. CleverTask Solutions SL - Big Data Business Unit 37 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  42. 42. CleverTask Solutions SL - Big Data Business Unit 38 Quality Evaluation 80% 20% Evaluation Training Test Dataset 100% Modelo
  43. 43. CleverTask Solutions SL - Big Data Business Unit 39 Quality Evaluation Accuracy Confusion Matrix
  44. 44. CleverTask Solutions SL - Big Data Business Unit 40 Quality Evaluation 54% 75%
  45. 45. CleverTask Solutions SL - Big Data Business Unit 41 Quality Evaluation Predicted vs Real Distribution
  46. 46. CleverTask Solutions SL - Big Data Business Unit 42 Cooking Predictions 80% 20% Tasting the Dish Cooking Transforming Go to the market to buy ingredients Cleaning
  47. 47. CleverTask Solutions SL - Big Data Business Unit 42 Cooking Predictions 80% 20% Evaluating Prediction Quality Training the Model Transforming and Feature Engineering Gathering RAW data Cleaning Data
  48. 48. CleverTask Solutions SL - Big Data Business Unit 43 Other Techniques Ensembles Clusters Weight Analysis Anomaly Detection
  49. 49. CleverTask Solutions SL - Big Data Business Unit 44 END email: andresg@clevertask.com Twitter: @data_lytics www.clevertask.com

×