Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data science challenges in flight search

Have you ever searched for a flight online? Do you wonder when and where you can get the best price for your travel plans? And, why are there different flight prices? Now, do you want to know why it is hard to do a meta-search engine for travel, and especially for flights?

Presentation provided by SkyScanner, a leading travel search site offering unbiased, comprehensive and free flight, hotel and car hire search services, used by over 40 million unique visitors every month. Skyscanner opened its office in Sofia in October 2014 and is quickly growing its team here to help solve complex travel problems and continually improve their product.

Data science challenges in flight search

  1. 1. Data science challenges in flight search Konstantin Halachev Plamen Aleksandrov 29th July 2015
  2. 2. #SkyscannerSofia Agenda • Introduction • Why is it hard to do meta-search for flights? • A few applications of flights meta-search data Image from mastersindatascience.org
  3. 3. Who are we? Konstantin Halachev • Data science for bioinformatics (PhD with focus on epigenetic data) • Joined the new Skyscanner office in Sofia nine months ago Plamen Aleksandrov • Worked on flights search engine • Principal software engineer and squad lead in Skyscanner
  4. 4. What is Skyscanner? Skyscanner is a leading travel search site offering: • unbiased • comprehensive • free search services
  5. 5. Skyscanner in numbers? - 9 global offices - Sofia is the latest. Started with 7 people, now at 16 and growing fast - 700+ employees - 40+ million app downloads - 40+ million unique monthly visitors - 13+ million searches per day
  6. 6. #SkyscannerSofia Why is it hard to do meta-search for flights?
  7. 7. How do you plan your travel? by destination and dates by destination, choose dates by dates, choose destination Online Travel Search - Flights?
  8. 8. #SkyscannerSofia Airline industry
  9. 9. 4000+ airports served by commercial airlines 700+ airlines in the world; 25,000+ aircrafts 40 million scheduled commercial flights in 2014 100,000 flights per day - i.e. >1 per second 40% of flights within US and Canada 79% average airplane fill rate 3 billion passengers in 2014 Flights Frequency Source: http://www.iata.org/publications/economics/Pages/Air-Passenger-Monthly-Analysis.aspx
  10. 10. Profitability Source: http://www.iata.org/publications/economics/Pages/Air-Passenger-Monthly-Analysis.aspx at $8.27 per passenger distribution is where the money is Profitability is growing due to oil prices
  11. 11. #SkyscannerSofia Dimensionality of flights
  12. 12. Routes Source: https://www.itasoftware.com
  13. 13. One Ways and Round Trips Multi-leg: Open Jaws, Circle Trips Fares, Fare components, Pricing Units, Tickets Itinerary Structure A B A B A B A B A C A B C B A B A B CC
  14. 14. takeAAflights/fares on a SFO-BOS route Atotal of 25,401,415 validAAsolutions Only this particular airline and route Example Route SFO ORD DWF BOS 5 * 36 = 85 fc 19 * 32 = 109 fc 41 * 32 = 162 fc 9 * 32 = 87 fc
  15. 15. Even exact dates are complicated time to travel changes price weekend stay and seasonality advance purchase Dates give interesting features and patterns day of week stay duration age of quote/price seasons: Christmas, Easter, holidays Dates
  16. 16. Prices Airline use seat availability to adjust price prices are volatile – 26 booking classes Airlines do Variable Pricing for fare portfolios your flight neighbour paid a different price 15,000,000 availability questions per sec no lock-down between search and book Prices for the same seat can still be different who sells your ticket? – codeshare, agency, OTA
  17. 17. All tickets are booked at website or GDS Distribution Providers
  18. 18. #SkyscannerSofia Data and Scale at Skyscanner
  19. 19. 40m unique monthly visitors 120m visits on web and mobile per month 13m searches per day results are up-to-date user experiences on the web Searches on Month view and Browse view Exits by redirects we don’t take ownership of the booking we keep true to our users, providers and own values Searches and Exits
  20. 20. ​2bn quotes per day => 700bn quotes per year quotes contain entire itinerary and price data can be easily processed and/or extracted prices are up to date, but we also keep historical data 200GB gzipped data per day => 80TB per year​​ 95% airlines and OTAs world coverage Data
  21. 21. How much data is that? A small list of technologies used: • Thrift/ RabbitMQ/ Ruby/ FluentD, • Scala/ Spark/ Hive, • AWS (S3, Glacier, EC2, Elastic MapReduce, DynamoDB), • Elasticsearch/ Kibana, • Python/ Flask Image from vicchi.org 2,000,000,000 quotes per day
  22. 22. #SkyscannerSofia What can we do with these data?
  23. 23. Search
  24. 24. Search
  25. 25. Search
  26. 26. Search
  27. 27. Search
  28. 28. #SkyscannerSofia What can we do with these data? 1. Dynamics of flight prices 2. Travel Insights for airlines and airports 3. Inspiration – finding good deals 4. A small analysis
  29. 29. Dynamics of flight prices Route LON - MAD Direct only One way 1. Too many routes -> Let’s select a popular route (London - Madrid) 2. Let’s focus on direct connections only 3. Let’s focus on one-way only
  30. 30. Dynamics of flight prices Route LON - MAD Direct only One way
  31. 31. Dynamics of flight prices Route LON - MAD Direct only One way Carrier
  32. 32. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair
  33. 33. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair Travelling on
  34. 34. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair Travelling on
  35. 35. Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Dynamics of flight prices
  36. 36. Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Month of travel Dynamics of flight prices
  37. 37. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Month of travel
  38. 38. Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Month of travel - May Dynamics of flight prices
  39. 39. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Month of travel - May
  40. 40. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Month of travel - May
  41. 41. #SkyscannerSofia What can we do with these data? 1. Dynamics of flight prices 2. Travel Insights for airlines and airports 3. Inspiration – finding good deals 4. A small analysis
  42. 42. Travel Insights – for airlines and airports
  43. 43. Travel Insights – for airlines and airports
  44. 44. Travel Insights – for airlines and airports
  45. 45. Travel Insights – for airlines Another small list of technologies used : • Python, .Net • AWS (S3, Redshift, EC2), MS SQL • Tableau
  46. 46. #SkyscannerSofia What can we do with these data? 1. Dynamics of flight prices 2. Travel Insights for airlines and airports 3. Inspiration – finding good deals • Where? • When? • Which deal is good? 4. A small analysis
  47. 47. Travel Inspiration- When and Where
  48. 48. Travel Inspiration - a hack day project
  49. 49. Travel Inspiration – is it a good deal?
  50. 50. Travel Inspiration - Skyscanner API Technologies used: Google maps, Python, Flask, AWS Redshift, Skyscanner API You want to do better? http://business.skyscanner.net/ You can get a trial API key by filling in the feedback form at the end of the event: http://goo.gl/forms/i4C2VcSGyW
  51. 51. #SkyscannerSofia What can we do with this data? 1. Dynamics of flight prices 2. Travel Insights for airlines and airports 3. Inspiration – finding good deals 4. A small analysis or how did demand for trips to Greece change in the heat of the crisis and what do the Danish know about it?
  52. 52. Analysis - Greece
  53. 53. Analysis - Greece Red represents week on week decrease. Green is increase. Data for 2015
  54. 54. Analysis - Greece Red represents week on week decrease. Green is increase. Data for 2014
  55. 55. What we know we did not talk about? • What is the best way to get the cheapest deals? • Recommendations • Personalization • A/B testing • Sorting of flight results • Infrastructure • Ahum, “Travel”… Image credit: jangosteve.com
  56. 56. #SkyscannerSofia Thank you! Please give us feedback or apply for API keys here: http://goo.gl/forms/i4C2VcSGyW • Konstantin Halachev konstantin.halachev@skyscanner.net • Plamen Aleksandrov plamen.aleksandrov@skyscanner.net We are hiring!!!

    Be the first to comment

    Login to see the comments

  • eugeneyan

    Aug. 29, 2015
  • mehdimkz

    Aug. 29, 2015
  • PankajPawan4

    Mar. 14, 2016
  • cjcorney

    Apr. 21, 2016
  • MeeraHaynes

    May. 18, 2017

Have you ever searched for a flight online? Do you wonder when and where you can get the best price for your travel plans? And, why are there different flight prices? Now, do you want to know why it is hard to do a meta-search engine for travel, and especially for flights? Presentation provided by SkyScanner, a leading travel search site offering unbiased, comprehensive and free flight, hotel and car hire search services, used by over 40 million unique visitors every month. Skyscanner opened its office in Sofia in October 2014 and is quickly growing its team here to help solve complex travel problems and continually improve their product.

Views

Total views

4,642

On Slideshare

0

From embeds

0

Number of embeds

1,974

Actions

Downloads

75

Shares

0

Comments

0

Likes

5

×