Some time ago I participated in Emirates Group AI Hackthon. I would like to share my results (see my presentation below).
The problem statement - Group materialisation Forecast:
Groups book in bulk and block a large number of seats on a flight departure date. The general behaviour of group bookings is volatile. A group materialisation model will predict the expected materialisation rate on the reservations and will help influence the overbooking levels to avoid empty seats due to last minute cancellations.
1. made with from innovation lab
AI Hackathon.
Team: Vera Ekimenko
2. made with from innovation lab
Technology Stack.
Spark for data wrangling
Spark ML Random Forests for the model
HTMLUnit for web scrapping
3. made with from innovation lab
Approach - Phases
Phases:
1. 0 days before Departure (all given data) <= used for evaluation
2. 2 days before Departure
3. 5 days before Departure
4. 14 days before Departure
5. 100 days before Departure
6. 0 days after Booking
7. 7 days after Booking
Departure
Booking 7 days
Evaluation Date
4. made with from innovation lab
Approach - Feature Engineering
Booking Departure
First Last
Event types:
- 1. TLT action
- 2. Passenger details
- 3. Payments
- 4. Service requests
- 5. Tickets issue
Dates
• Every
event has
a date
Duration
since/prior
• Every
range has
2 dates
Number of
days
• Every
action has
4 time
features
Counts
• Every
action is
an event
Occurrence
since/prior
• Calculate
how many
times
occurred
Sum of
occurrence
• Every
action has
2 count
features
• The number of days between the booking and the
first/last addition of individual passenger details
• The number of days between the first/last payments and
the departure
• The number of additions of TLT record
5. made with from innovation lab
Travel types based on segments analysis
MELDXB-DXBMAA-CCUDXB-DXBMEL
SINDXB-DXBATH-ATHDXB-DXBSIN
WAWDXB-DXBIKA-DXBWAW
DACDXB-DXBBAH
55%
37%
4%
5%
Two
destinations
One way
Disperse
One
destination
6. made with from innovation lab
External data
Passport
Index
TCdata 360Holidays
Airports
USA travel
advisories
Segments
Destination 1 Destination 2Boarding Point
Departure Date
13. made with from innovation lab
Accuracy
0%
20%
40%
60%
80%
100%
120%
With external data
Without external data
14. made with from innovation lab
External data boost
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
7.00%
Accuracy PR AUC ROC AUC
15. made with from innovation lab
Feature Importance for different phases - 1
16. made with from innovation lab
Feature Importance for different phases - 2
17. made with from innovation lab
Scalability
airport
•Country data
•Politics
organizator
•Travel agency
•Regular group
travellers
season
•Christmas
•Events
Technical scalability Features scalability
18. made with from innovation lab
Self-learning
• Selected
features
• Train data
Initial model
• Monitor model
deterioration
• Re-generate the
adjusted model
Test accuracy
nightly • New features
• New data
Adjustments
19. made with from innovation lab
Reasons why it’s the best solution
• Native for HELIX -> easy to deploy and maintain
• Low maintenance -> easy to update the model
• Fast and scalable -> evaluate group bookings nightly with no fuzz
• Good foundation for other models -> Recommender for new stations
• Pluggable -> can be used to enrich the existing models
• Transparency -> Easy to communicate non-tech how the model works with
Feature Importance
• Robustness -> the model works even if data quality is not perfect
20. made with from innovation lab
Annex (1) – Full list of features used in the model
PAX - Total number of passenger in the group
Pcc2City - 1 = PCC equals to City
IsGWS - 1 = GWS_ID is not empty
DepDateDays - the number of days between the booking and the departure
NumberSegments - the number of segments booked
BookingToRequestDays - the number of days between the request and the booking
RequestCreatedPriorDays - the number of days between the request and the departure
IndPaxAdditionFromDays - the number of days between the booking and the first addition of individual passenger
details
IndPaxAdditionFromPriorDays - the number of days between the first addition of individual passenger details and
the departure
IndPaxAdditionToDays - the number of days between the last addition of individual passenger details
IndPaxAdditionToPriorDays - the number of days between the last addition of individual passenger details and the
departure
IndividualPaxAdditionSum - the number of addition of individual passenger details
IndividualPaxRemovalSum - the number of removal of individual passenger details
PaymentsFromDays - the number of days between the booking and the first payment
PaymentsFromPriorDays - the number of days between the first of payment and the departure
PaymentsToDays - the number of days between the last of payment
PaymentsToPriorDays - the number of days between the last of payment and the departure
IncludedCSR - The payment is included in the sales report
21. made with from innovation lab
Annex (2) – Full list of features used in the model
TLTActionDateAdditionsFromDays - the number of days between the booking and the first addition of TLT record
TLTActionDateAdditionsFromPriorDays - the number of days between the first addition of TLT record and the
departure
TLTActionDateAdditionsToDays - the number of days between the last addition of TLT record
TLTActionDateAdditionsToPriorDays - the number of days between the last addition of TLT record and the
departure
TLTActionDateRemovalFromDays - the number of days between the booking and the first removal of TLT record
TLTActionDateRemovalFromPriorDays - the number of days between the first removal of TLT record and the
departure
TLTActionDateRemovalToDays - the number of days between the last removal of TLT record
TLTActionDateRemovalToPriorDays - the number of days between the last removal of TLT record and the departure
TLTAdditionsSum - the number of addition of TLT record
TLTRemovalSum - the number of removal of TLT record
ServicesCount - The number of the service requests added
isonedest - The journey has one destination
ismultidest - The journey has two destinations
ismultirtn - The journey has multiple returning points
isoneleg - The journey is one way
isgathering - The journey has multiple boarding points
22. made with from innovation lab
Annex (3) – Full list of features used in the model
ServicesDepartureDateMinDays - the number of days between the booking and the earliest departure date for the added
services
ServicesDepartureDateMinPriorDays - the number of days between the earliest departure date for the added services and the
departure
FirstDepDateMonth - the months of the departure
FirstDepDateDay - the day of the departure
IsIATA - 1=The agent is a member of IATA
ACCEPTED_GROUP_SIZE - The accepted group size
ActuallyGivenAndAccepted - The difference between the accepted group size and total number of passenger
KidPerAdult - The number of infants and children passengers per adult passenger
NotAcceptedAdult - The difference between the requested number of number of adult passengers and accepted number of adult
passengers
NotAcceptedChild - The difference between the requested number of number of child passengers and accepted number of child
passengers
NotAcceptedInfant - The difference between the requested number of number of infant passengers and accepted number of
infant passengers
ACC_ADULT - The accepted number of adult passengers
ACC_CHILD - The accepted number of child passengers
ACC_INF - The accepted number of infant passengers
23. made with from innovation lab
Annex (4) – Full list of features used in the model
airport_infrastructure - The difference in levels of airport infrastructure in the first destination country
business_environment - The difference in levels of business environment in the first destination country
culresources_bustravel - The difference in levels of cultural resources and business travel in the first destination country
enabling_environment - The difference in levels of enabling environment in the first destination country
environmental_sustainability - The difference in levels of environmental sustainability in the first destination country
tourism_priority - The difference in levels of tourism priority in the first destination country
ground_port_infrastructure - The difference in levels of ground port infrastructure in the first destination country
health_hygiene - The difference in levels of health hygiene in the first destination country
labor_market - The difference in levels of labour market in the first destination country
infrastructure_subindex - The difference in levels of infrastructure sub-index in the first destination country
international_openness - The difference in levels of international openness in the first destination country
natural_cultural_resources - The difference in levels of natural and cultural resources in the first destination country
natural_resources - The difference in levels of natural resources in the first destination country
price_competitiveness - The difference in levels of price competitiveness in the first destination country
safety_security - The difference in levels of safety and security in the first destination country
tourist_infrastructure - The difference in levels of tourist infrastructure in the first destination country
travel_ict_readiness - The difference in levels of travel and tourism ICT readiness in the first destination country
travel_policy - The difference in levels of travel policy in the first destination country
travel_competitiveness - The difference in levels of Travel and Tourism policy and enabling conditions in the first destination
country
24. made with from innovation lab
Annex (5) – Full list of features used in the model
dest1_passport_requirements - The level of difficulty to get a visa to the first destination country
dest2_passport_requirements - The level of difficulty to get a visa to the second destination country if any
dest1_distance - The geographical distance between the boarding point and the first destination
dest1_timediff - The time lag between the boarding point and the first destination
adv_dest1_levelNN - USA travel advisory level for the first destination country
adv_dest2_levelNN - USA travel advisory level for the second destination country if any
AroundHoliday - The departure date is a public holiday (+/- 3 days) in the original country
AroundWeekend2 - The departure date is a weekend (+ / - 1 day) in the original country
countriesOHE - The original country