Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture


Published on

Hotelbeds is one of the leading bedbanks in the world, a global distributor of accommodation based in Palma de Mallorca, with an offering of more than 170.000 hotels and up to 2.500 Million request per day.

In the last two years Hotelbeds Group has undergone a profound technical transformation from an old monolythic on premise architecture to a microservices cloud-native architecture. Nowadays a big chunk of our monolyth has been transformed into java microservices deployed on cloud and we are already processing all our bookings on cloud. We have also moved to the cloud our availability engine, distributed to several locations accross the globe to be closer to the clients and reduce latencies.

This talk is about the path we followed on this fascinating journey, the decisions taken and lessons learned, as well as some insight on our future challenges.

Video available at

* All images used in this presentation are for illustration purposes only. They have been created specifically or obtained from publicly accessible sources. If any image is incorrectly used please contact me and it will be immediately removed and not used in the future.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture

  1. 1. Jordi Puigsegur Figueras ▪ Head of Solution Architecture Hotelbeds ▪ Course Instructor High Scale Distributed Systems Universitat Oberta de Catalunya
  2. 2. 1. Who we are 2. Hotelbeds Journey 3. Survival Principles
  3. 3. ▪ Hotelbeds Group is a leading bedbank and a business-to-business (B2B) provider to the global travel industry. ▪ In September 2016 Hotelbeds Group was acquired by Cinven and the Canada Pension Plan Investment Board. ▪ In June 2017 Tourico Holidays became part of Hotelbeds Group and in October GTA also became part of the Group. ▪ Hotelbeds Group offers travel providers access to a network of over 60,000 travel sellers from around 185 source markets globally. ▪ Travel sellers have access to over 170,000 hotels, +22,000 transfer routes, +16,000 activities and 142,000 car rental products. ▪ The technology platform handles around 2 billion data requests per day – with peaks of up to 2.5 billion and 40.000 request per second – from users worldwide. ▪ +5000 employees worldwide. 210 offices globally. Biggest single site is the head office in Palma de Mallorca where over 1500 people work.
  4. 4. • Apitude Cloud • 1.200M request per day • 20k request per second • Distributed Availability • Multi-region deployments • 2.000M requests per day • 40k request per second • Transfers and Activities product • Product update for content (FTP) • 1M requests per day • Cache pull service • Proprietary format & Rules based • Allow customers to scan inventory • 10M requests per day • API driven Product Distribution Strategy – Suite of APIs • Focus on scalable and high performance platforms • Ease of integration development driven experience • 200 M requests per day 2008 2010 • Technical breakthrough in the market • Accommodation product only • Thousands of requests per day XML2 AIF2 2002 XML1 2017 APItude APItude 2015 2018 APItude
  5. 5. Three main initiatives will shape Hotelbeds Technical evolution 2016 2017 2018 Distributed Availability: Going Global ATLAS+ Project: Breaking the monolith Apitude Migration to cloud
  6. 6. ▪ APITUDE is a new redesign of our APIs ▪ Live by end of 2015 ▪ 30% faster than the existing API ▪ DX approach: simpler and based on new technologies ▪ Sets the ground for a modern microservice cloud-native architecture: ○ Cloud-ready services based on Spring Boot ○ Immutable deployments (rpm) and external configuration ○ New technology components: Redis, Spring Config Server ○ Focus on enabling automation ○ New “cloud-friendly” architectural patterns
  7. 7. ▪ Big monolithic Oracle Database with most of the company’s business logic inside. ▪ Some satellite Java services… ○ ... but all logic still in PL code ○ ~ 2500 tables, ~ 1.1 M LoC ○ Montly releases with full stop of up to 1 h. ▪ The new Apitude API platform, is already live, but it is still hosted on premise. ▪ Most hotel availability requests are solved by the legacy XML 2 Platform (on premise). ▪ Apitude rollout is just beginning.
  8. 8. ▪ We beginning to face serious scalability issues. ▪ Our Oracle based platform is not going to cope with the expected business growth. ▪ Vertical scalability is not an option … we are already running on powerful Oracle Exadata hardware. ▪ Some estimates only leave 18 months until platform saturation.
  9. 9. ▪ The main driver is to reduce latencies for our globally distributed customer base. ▪ Availability requests are increasing exponentially, therefore: ○ We need more flexibility to grow and evolve new Apitude services ○ We need autoscaling to adapt to varying loads (day / week / seasonal) ▪ Cloud can also be a cost-saving driver
  10. 10. ▪ Cloud migration strategy is mixed: ○ Migrating new cloud native components ○ Plus lift and shift of older ones ▪ Deployed in AWS - 1 region: Europe ▪ Based on IaaS deployments of binary immutable rpms with external configuration ▪ Some managed services (ElastiCache & ELBs) ▪ Adjusting autoscaling and fine tuning takes time ▪ Good monitoring is crucial
  11. 11. ▪ Project focused on extracting the core business logic inside our big monolithic Oracle Database ▪ Migration of business logic to cloud-native microservices: ○ Full reengineering of backend services ○ No business involvement … transparent migration 4 Teams 50 Developers 9 Months 20 new Spring Boot services 70 % on Cloud
  12. 12. ▪ Hybrid on-premise - cloud approach: ○ Madrid datacenter + 1 AWS region ○ Logic is moved to cloud microservices ○ Data is kept on Oracle DB (on-premise) ○ Prioritize use of cloud data ▪ Data replication on-premise - cloud becomes crucial ○ Use of Kafka (own deployment) ▪ PostgreSQL is the choice for microservices database ○ Managed RDS instances ○ Sometimes noSQL approach
  13. 13. AtlasDB BOOKING BL BOOKING API API PostgreSQL onpremise onpremisecloud AtlasDB Example booking operations: ▪ Booking List ▪ Booking Detail
  14. 14. ▪ Hotel Availability across three regions: ○ Europe ○ North America ○ Asia ▪ Global data replication using Kafka ▪ Customers are geographically redirected by dynamic DNS. (check Eric Janz talk this afternoon) ▪ Better latencies across the globe ▪ New options for growth ▪ Good monitoring becomes crucial
  15. 15. … for the microservices jungle! ▪ Standardization ▪ Decoupling ▪ Data replication ▪ Resilient designs ▪ Automation ▪ Microservice support ecosystem ▪ Governance
  16. 16. ▪ We all know microservices are cool ▪ We all want to do microservices!! ▪ We all know the advantages of microservices ▪ But Microservice architectures ○ are complex ○ carry many hidden overheads ▪ In fact ... You are going to build a distributed system and distributed systems are hard!
  17. 17. ▪ Programming Language: Java ▪ Parent poms with most relevant dependencies ▪ Some (not many) libraries, e.g. metrics ▪ Standardized service archetype (maven) ○ Ready to run in Hotelbeds ecosystem ○ Produces binary rpms (docker images soon!) ▪ REST APIs designed following similar principles ▪ Carefully chosen set of technology components ▪ Reference architectures on when and how to use components and libraries ▪ Technology radar
  18. 18. ▪ Decoupling is essential to achieve the microservices goals ▪ Good decoupled architecture ... ○ Helps scale dev teams ○ High scalability & efficiency enabler ○ Supports future features naturally ▪ Independent deployments and life cycles for each Service ▪ The API is the only point of access of the service (REST endpoint, Kafka, …) ▪ Data is private: no database access from external components
  19. 19. ▪ Importance of clean service boundaries ▪ One rule of thumb: changes shouldn’t involve several microservices ▪ Domain-Driven Design as a very useful set of tools: ○ Focuses on domain knowledge and its representation on code ○ Focus on Strategic patterns ○ Bounded contexts as the basis for microservice boundaries ▪ Beware! ○ Service boundaries are hard to define! ○ Easy to end up with a distributed monolith / microservice spaghetti / ...
  20. 20. ▪ Data replication between services is crucial in a hybrid cloud / multi-[region|cloud] environment ▪ Each entity is owned by a service ▪ All the other services access the owner service via REST API or consuming its Kafka messages ▪ Kafka messages contain exactly the same entities as the service REST API ▪ Kafka is our message broker and basic tool to replicate data between services ○ Scalability ○ Partial order guarantees ○ Kafka “mirror maker” for moving data across locations (check Kafka talk by Isa and Alicia tomorrow!)
  21. 21. Resilience: "the ability of a system to withstand changes in its environment and still function" Wikipedia ▪ We need to design with resilience in mind ▪ Favor self-healing architectures ▪ Remember! We are dealing with distributed systems ▪ Resilience patterns: check Uwe Friedrichsen talks (slideshare)
  22. 22. ▪ Protect your services for the unexpected ○ Overloads ○ Timeouts ○ Downstream errors ○ Datacenter failures ○ etc. ▪ Protect your services even if they are only internally exposed ▪ Focus on protecting each service individually ▪ Let good system behaviour emerge from good service level practices
  23. 23. ▪ Hystrix library provides several very useful resilience patterns: ○ Circuit breaker ○ Load shedding ○ Timeouts ○ Fallbacks ○ Retries DISTRIBUTION 3rd PARTY INTERNAL PRODUCT Suppliers
  24. 24. * from: Coordination Avoidance in Database Systems Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica “Minimize coordination, or blocking communication between concurrently executing operations, is key to maximizing scalability, availability, and high performance.” * ▪ Allow services / instances / threads to keep Working independently of its peers, its dependencies and even its clients ▪ Favor push vs pull strategies ▪ Favor asynchronous vs synchronous ▪ Favor local caches vs complicated grid / replicated caches ▪ The Reactive Manifesto
  25. 25. ▪ Services publish entity changes to Kafka ▪ Client services can consume these streams and keep a local memory replica ▪ Important for high transaccionality / low latency services ▪ Kafka compaction guarantees that at least one message per key is kept ▪ Every time an instance of the client service spins ups can load the caches in memory and keep listening for changes ▪ Each service instance is independent. No communication between peers. PRODUCT MASTER DATA DISTRIBUTION
  26. 26. ▪ Dedicated Delivery & Automation team ▪ Devops roles inside scrum teams ▪ Automated CI/CD pipelines based on GitHub Flow ▪ Infrastructure automation ▪ Infrastructure as a code ▪ Automated testing is key for continuous delivery: ○ Unit testing ○ Integration testing ○ Smoke test & end 2 end testing using framework based on TestNG
  28. 28. ▪ Service catalogue based on Enterprise Architect and own tools: ○ Architecture baselines ○ Ownership of services ○ Dependencies ○ Targets & Transitions ▪ Dedicated Information Architecture Team ▪ Clear Process for new Services provisioning ▪ Automation Integration: No new deployments of deprecated components ▪ IT Cost Model Tool
  29. 29. ▪ Focused on integration of the three companies ▪ Reorganization into a product based company ▪ Moving more business logic into microservices ▪ Multi-cloud ▪ Containers ▪ Keep improving our platform ○ More resilient ○ More agile ○ Better TTM