Your SlideShare is downloading. ×
Prediction and Fuzzy Logic at ThomasCook to automate price ...
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Prediction and Fuzzy Logic at ThomasCook to automate price ...

290
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
290
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. BNOSAC @ ThomasCook Challenges from a data mining point of view + solutions Connecting R with the outside world / our user experience Prediction and Fuzzy Logic at ThomasCook to automate price settings of last minute offers Jan Wijffels: jwijffels@bnosac.be BNOSAC - Belgium Network of Open Source Analytical Consultants www.bnosac.be July 10, 2009 Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 2. BNOSAC @ ThomasCook Who are we Challenges from a data mining point of view + solutions Business of ThomasCook Belgium Connecting R with the outside world / our user experience Introduction to last minute prices Introduction to BNOSAC Group of consultants focussed on open source analytical engineering Poor man’s BI: Python/PostgreSQL/Pentaho/OpenOffice/R. . . ... Expertise in predictive data mining, biostatistics, geostats, python programming, GUI building, artificial intelligence Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 3. BNOSAC @ ThomasCook Who are we Challenges from a data mining point of view + solutions Business of ThomasCook Belgium Connecting R with the outside world / our user experience Introduction to last minute prices Business of ThomasCook Belgium Sell holidays (sun and beach in this user case) 70 destinations around Mediterranean and Americas Own planes & bought seats need to be filled with passengers Flight frequence for some destinations up to 4 flights within one day. Some flights can be combined (BRU->ACE->FUE->ACE->BRU) Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 4. BNOSAC @ ThomasCook Who are we Challenges from a data mining point of view + solutions Business of ThomasCook Belgium Connecting R with the outside world / our user experience Introduction to last minute prices Introduction to last minute price settings Last minute prices departures Brussels/Li`ge/Ostend/Lille e Up to 2 months before departure People book now to go on holiday e.g. August 10, 2009 to destination X. Can stay 3-28 nights, choose among several hotels, with certain board (All Inclusive, B&B, . . . ) and certain room type. e.g. Hurghada (HRG): dayly flights from Brussels (BRU) # prices in August: 31 days × 12 durations × 2 brands × 20 hotels × 4 boards × 3 room types = ±248000 prices Prices can go or depending on offer and demand Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 5. BNOSAC @ ThomasCook Who are we Challenges from a data mining point of view + solutions Business of ThomasCook Belgium Connecting R with the outside world / our user experience Introduction to last minute prices Business challenge Business challenge Fill the planes at the highest prices so that the plane doesn’t fill too fast and make sure all seats are filled. Currently 2.9 Mio promotional prices on the market. Prices change dayly. Only cover approaches towards prices of packages (flight + hotel), only price effects of couples (so no children). Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 6. Optimisation problem BNOSAC @ ThomasCook Data & speed challenge Challenges from a data mining point of view + solutions Architectural solution Connecting R with the outside world / our user experience Analytical solution - optimal prices with business tactics Analytical solution: Fuzzy Logic Optimisation problem A lot of factors influencing bookings: Holiday information / Day of the week Flight information (hours of departure and of return flights, availability of flights) Weather Prices (2 brands, competitor) and price evolution Cannibalisation (risk of losing passengers to yourself) prices of similar destinations - last minute customers only want the sun at the cheapest price prices on similar departure dates (a few days later/earlier) Days before departure ... dimensionality is large (> 100000 factors could influence bookings on flight from BRU to HRG on August 10, 2009) Find the best price settings over all these parameters to ... optimize margin / minimize risk / optimize market share Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 7. Optimisation problem BNOSAC @ ThomasCook Data & speed challenge Challenges from a data mining point of view + solutions Architectural solution Connecting R with the outside world / our user experience Analytical solution - optimal prices with business tactics Analytical solution: Fuzzy Logic Data & speed challenge Data size last year only own last minute promotional prices: >450 million records. competitor prices flight info: ± 60000 flights on the market × 365 days ± 21.900.000 records weather info at noon: 70 destinations × 365 days × weather forecasts Speed ”Hello prices” at ±7o’clock in the morning (mainframe). ”Hello employees” at ±8h30 in the morning ±1h30 to make predictions and give ’best’ automatic price proposals Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 8. Optimisation problem BNOSAC @ ThomasCook Data & speed challenge Challenges from a data mining point of view + solutions Architectural solution Connecting R with the outside world / our user experience Analytical solution - optimal prices with business tactics Analytical solution: Fuzzy Logic Architectural solution Data Knowledge / Strategy Business process Update checker Price setting ­ GUI in wxPython (py2exe) Python / Beautifulsoup ­ Manager strategy on  ­ plots in R through RPy2    Price/Brand/Competition ­ Learned  Fuzzy ­ analytical data    cannibalisation effects    mart inference ­ Learned price elasticity ­ historical data engine ­ Predicted risk of  ­ clean    unsold seats ­ predictions /  ­ Weather risk check launch   best price settings ­ Historic price levels save price proposals ­ Selling margins ­ Basic 1D­optimisation MASTER FTP .txt Model building Web .xml Predictive models ­ Randomforests get data get data PL/R  .csv Variable reduction PL/SQL Oracle ­ glmpath NOAA ETL using R pimped users approve ­ flexible data  Model store price settings    structures + structure ­ easy to program     & maintain save .RData ­ access to anything predictions ­ fast development     in case of change ­ with (R)SQLite &  Predictions    sqldf ­ can handle    any data size pimped ­ get model structure SLAVE / application DB ­ prepare for prediction ­ predict     Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 9. Optimisation problem BNOSAC @ ThomasCook Data & speed challenge Challenges from a data mining point of view + solutions Architectural solution Connecting R with the outside world / our user experience Analytical solution - optimal prices with business tactics Analytical solution: Fuzzy Logic Analytical solution: Predictive modelling Out of the box solutions exist in R. ’Best practice’ approach: Pimp SQLite so that it can handle tables with up to ±30000 columns. Raw model tables dim 20.000.000 x 30000 Data preparation (missing values, split numeric data in categories) - do heavy reshaping/juggling/merging/indexing in (R)SQLite & sqldf, use R for advanced data features Sample depending on CPU/RAM and statistical technique: we have 4 dual cores, 64bit Linux, 32Gb RAM. Reduce: GLM with penalization on the size of the L1 norm of the coefficients L(β, λ) = − n yi θ(β)i b(θ(β)i ) + λ β 1 i=0 (glmpath package) Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 10. Optimisation problem BNOSAC @ ThomasCook Data & speed challenge Challenges from a data mining point of view + solutions Architectural solution Connecting R with the outside world / our user experience Analytical solution - optimal prices with business tactics Analytical solution: Fuzzy Logic Analytical solution: Predictive modelling cont. Only most important predictors to build randomForest Use randomForest model to predict how fast the flights will fill. Variable importance afreis.week q boeking.week q neckermann.bru.7.lo q neckermann.lgg.7.ai q rt8.free q neckermann.bru.7.ai q thomascook.bru.7.ai q thomascook.bru.14.lo q rt7.free q neckermann.bru.10.ai q thomascook.bru.5.lo q rt11.free q f.t7.lang q afreis.weekdag q rt7.vl2.free q boeking.weekdag q rt14.free q f.t10.lang q thomascook.bru.10.ai q rt5.free q thomascook.bru.12.ai q rt5.vl1.free q f.t11.lang q f.t11.kort q neckermann.bru.5.ai q rt9.vl2.free q thomascook.bru.5.hp q iata.from q rt14.vl1.combi q f.t7.kort q 0 50 100 150 200 250 300 350 IncNodePurity Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 11. Optimisation problem BNOSAC @ ThomasCook Data & speed challenge Challenges from a data mining point of view + solutions Architectural solution Connecting R with the outside world / our user experience Analytical solution - optimal prices with business tactics Analytical solution: Fuzzy Logic Analytical solution: Predictive modelling cont. Get the price effects from the randomForest model and use it: Do fast 1- or 2-dimensional optimisation to fill seats that will not be filled according to the forecast at the optimal price. Price elasticity qqqqqqqqqqqqqqqqqqqqqqqq 1.2 q qqq q 1.1 q 1.0 Passengers / day q 0.9 qq q q q q q q q q q q 0.8 q qqq q q qqq 0.7 q q q q q qqq qq qq qqqqqq q q qqq q q q q qq qqq qqq 0.6 q q q q qqqqqq qq 400 600 800 1000 Lowest ThomasCook Price, T7 AI Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 12. Optimisation problem BNOSAC @ ThomasCook Data & speed challenge Challenges from a data mining point of view + solutions Architectural solution Connecting R with the outside world / our user experience Analytical solution - optimal prices with business tactics Analytical solution: Fuzzy Logic Analytical solution: Fuzzy Logic Prediction and optimisation is nice but not enough Managers reason with words/concepts. Mimic them and combine their logic with predictive logic. How? Map business concepts to fuzzy sets. Make fuzzy rule-based engine reflecting how managers/business users decide on price settings Do fuzzy inference to obtain new price settings Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 13. Optimisation problem BNOSAC @ ThomasCook Data & speed challenge Challenges from a data mining point of view + solutions Architectural solution Connecting R with the outside world / our user experience Analytical solution - optimal prices with business tactics Analytical solution: Fuzzy Logic Analytical solution: Fuzzy Logic cont. Map business concepts to fuzzy sets. Low­level fuzzy sets/inputs Higher­level fuzzy sets/inputs Current risk ­ Current flight situation empty seats  Listen to the people. Predicted risk Fuzzy concepts have ­ Randomforest Prediction  empty seats blurred boundaries. ­ Simulation @ different   price levels & timepoints Elasticity­based  optimal Price Move Fuzzy  rule base ­ Price elasticity in models Competitor risk  Map linguistic variables to   + importance measures   for different competitors same destination ­ Overlapping hotels difference Competitor risk Optimal  a membership degree ­ Overlapping hotels price change ­ Other hotels price level ­ Other hotels price change Competitor risk  other destinations Price Move µ(x) ∈ [0, 1] ­ Price elasticity in models ­ Current price levels Outgoing  cannibalisation ­ Historic cannibalisation levels sets package (Hornik K., Incoming Cannibalisation Meyer D., Buchta C.) Days  before departure fuzzy normal, Expert opinions & other inputs ... fuzzy trapezoid, fuzzy sigmoid, . . .     Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 14. Optimisation problem BNOSAC @ ThomasCook Data & speed challenge Challenges from a data mining point of view + solutions Architectural solution Connecting R with the outside world / our user experience Analytical solution - optimal prices with business tactics Analytical solution: Fuzzy Logic Analytical solution: Fuzzy Logic cont. Make fuzzy rule-based engine, do fuzzy inference & defuzzify. rules <- set( fuzzy_rule(predicted_risk %is% low, price_change %is% up), fuzzy_rule(predicted_risk %is% high & competitor_risk %is% high, price_change %is% down_high) ...) simple.system <- fuzzy_system(variables, rules) fuzzy.best.price <- fuzzy_inference(simple.system, NEWDATA) gset_defuzzify(fuzzy.best.price, "centroid") Different business strategies can be easily mapped to fuzzy inference engines. Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 15. BNOSAC @ ThomasCook Influence the business process Challenges from a data mining point of view + solutions PL/R, RPy2, GUI’s in R, people Connecting R with the outside world / our user experience Questions? Influence the business process, use visuals, build GUI Prediction, optimisation and improving on business users is nice but not enough, you need to influence the business process Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 16. BNOSAC @ ThomasCook Influence the business process Challenges from a data mining point of view + solutions PL/R, RPy2, GUI’s in R, people Connecting R with the outside world / our user experience Questions? PL/R, RPy2, GUI’s in R, people PL/R. Had a lot of shared memory problems while other processes were runnning. But probably overkilled it (run PL/R script which calls some R code from within R process that uses RdbiPgSQL) Debugging hell. R & SQLite is our best choice for heavy data juggling. PL/R is OK for collecting information on diverse data sources in 1 call from a remote machine. Useful for plotting purposes in SaaS framework. Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 17. BNOSAC @ ThomasCook Influence the business process Challenges from a data mining point of view + solutions PL/R, RPy2, GUI’s in R, people Connecting R with the outside world / our user experience Questions? PL/R, RPy2, GUI’s in R, people cont. User interfaces - developer view Combining wxPython and R through RPy2 is easy and simple. py2exe gives easy python binary executables, people only need to have R installed to access its power User interfaces - IT view IT departments don’t like R R should be SaaS, central server where people can connect to User interfaces - business user point of view They don’t care about R GUI and plotting the results helped convincing them Fuzzy logic allowed them to interact and stick to the business. Combining the results with an improved business process was the most convincing factor. Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price
  • 18. BNOSAC @ ThomasCook Influence the business process Challenges from a data mining point of view + solutions PL/R, RPy2, GUI’s in R, people Connecting R with the outside world / our user experience Questions? Questions? http://www.bnosac.be Jan Wijffels: jwijffels@bnosac.be Prediction and Fuzzy Logic at ThomasCook to automate price