Moving to microservices – a technology
and organisation transformational journey
Boyan Dimitrov,
Platform Automation Lead @ Hailo @nathariel
FROM LONDON TO TOKYO,
FROM MADRID TO OSAKA.
The beginning…
Back in 2011 we started simple
System overview:
• Started on AWS
• PHP frontend and Java backend applications
• Built and supported by a small team: 3-4 backend engineers
MySQL
And then we started expanding rapidly
• City-specific environments
• Branching the code base
• Manually building infrastructure and configuration
MySQL
PHP Java
MySQL
PHP Java
MySQL
PHP Java
As we grew, getting features out became a challenge
We quickly found out that extending monoliths is hard:
• Hard to maintain the codebase
• Any new feature took weeks to deliver
• Hard to scale the dev teams
Failure to deliver business value
Performance
Operating a monolith in the cloud got even harder
A lot of development and ops time wasted in firefighting:
• Lack of automation
• Multiple SPOFs
• No proper monitoring
MySQLMonolith Monolith
• Unclear responsibilities
• No well defined escalation or ownership
process – many non actionable alerts
transformed into “all hands on deck” actions
So in 2013 we ended up
doing…
Monolith
We wanted to build a global platform
• Everything had to be automated – any workflow, any action
• Everything had to be resilient and self-healing – regardless of the failure
source: infrastructure, network or code
• Each service had to be responsible for one thing and one thing only
Looked at what we did wrong and redesigned it:
Key challenges:
• We decided to move from PHP & Java to Go
• We had to build everything from scratch but move all our production traffic
without any downtime
• We had to change our culture as we go
eu-west-1
Message Bus+
API Gateway
C*
us-east-1
API Gateway
C*
Message Bus+
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
Go
Service
We started with the building blocks
Logic
Storage
Library for abstracting
service-to-service
comms
service-layer
Handler platform-layer
Self-configuring
external service
adapters
A service under the hood:
• Service to service communication
libs
• Discovery
• Configuration
• A/B testing capabilities
• Monitoring & Instrumentation
• … and much more
Any service gets for free:
In preparation for the migration
Introducing a smart API Gateway made our life easy:
• Let us do a transparent, seamless migration from user perspective
• Gave us a lot of flexibility about how we route our traffic
• Enabled us to build a lot of failover capabilities
API Gateway Monolith
And then we started breaking down our monoliths
• We aimed to get production traffic on the new platform as quickly as possible
• We identified the low hanging fruit first and rewrote them
• We kept iterating on our platform and building more tools as we needed
API Gateway Decouple
At present we have
• Microservices ecosystem (99.9% written in Go)
• Designed specifically for the cloud – different building blocks and
components will constantly be in flux, broken or unavailable
• 1000+ AWS instances spanning multiple regions
• 200+ services in production
The Platform
Troll a platform by Swinsto101 / CC BY-SA 3.0 /
Desaturated from original
TeVPC
Auto Scaling
S3
OrchestrationEnv DNS
Release AutoScaling
Discovery
Monitoring
CFEC2
Route 53
Redshift
ComputeEIP
Routing
Core
Platform
Provisioning
Login
Services
Cloud Provider
Whisper
Config
• Lowest level building blocks
• We mostly use basic PaaS components and services as they cover most of our
needs
• We expect every underlying component to fail and we designed for this
TeVPC
Auto Scaling
S3
CFEC2
Route 53
Redshift
Cloud Provider
eu-west-1
Proxy Layer
Message Bus
eu-west-1a
Services
eu-west-1b eu-west-1c
Shared Infra
RabbitMQ RabbitMQ RabbitMQ
API API API
Go Go Go
x many
C*
NSQ
ZK
C*
NSQ
ZK
C*
NSQ
ZK
x many x many
• We use auto scaling groups for everything
 Guarantees each component can be rebuilt automatically
 Including our database clusters that run on ephemeral storage ( we do keep
6 copies of each piece of data in 2 regions )
• Minimum of 3 AZs in every region
• Every workflow is automated
• Every component has to be self-healing and scalable
Core principles
• Our “cloud provider abstraction” layer
• Main purpose is infrastructure and workflow automation and discovery
• Has a global view of everything happening across our infrastructure
• Provides additional capabilities on top of AWS
• The only services directly aware of our cloud provider specifics – gives us a lot of
flexibility and let us introduce changes quickly
OrchestrationEnv DNS
Release AutoScalingComputeEIP
Whisper
Everything in our platform emits events
So naturally we want to capture all external events as well!
Whisper Service
It’s all about event driven compute – think Lambda but within our platform
Events
Events
Hundreds of publishers & subscribers
NSQ Topics
Events
External
sources
Actions
To subscribe to any new event source
we have to only change a single service
Provides the most essential platform functions for every service:
• Service Discovery
• Service Provisioning
• Routing & Load Balancing
• Authentication/Authorization
• Monitoring
• Configuration
Discovery
Monitoring
Routing
Core
Platform
Provisioning
Login
Config
• Self-contained units of execution
• Built around business capabilities or domain objects
• Small enough to be rewritten in a few days
• Independently scalable
• They are all about adding business value
Services
All good but did we make our development any faster?
Up and running in seconds
Vetted & Tested & Built ~ 100 – 140 sec
Setup ~ 1 sec
Trigger a build ~ 1 sec
Deploying a service
service name version
auto scaling group
Deploying a service
Service A Service A
Traffic
5% 95%
Smart traffic shaping
LIVE
We can get a completely new service in production in hours
Hailo Warp Speed ( measured in many μs or something )
What about operating our platform?
Live traffic tracing
Microservices are all about the tooling!
Live request tracing
Live request tracing
Key Learnings
• Automate everything – it enabled us to do more with less
• Identify your KPIs and track them
• Invest in tooling: the complexity in a microservices architecture is
not in your application code anymore - it is in the thousands of
service interactions!
• Empowering our engineers increases your velocity tremendously!
• Moving to microservices is a journey – make sure you take everyone
onboard!
Thanks!
@nathariel
boyan@hailocab.com
@HailoTech

Moving to microservices – a technology and organisation transformational journey

  • 1.
    Moving to microservices– a technology and organisation transformational journey Boyan Dimitrov, Platform Automation Lead @ Hailo @nathariel
  • 3.
    FROM LONDON TOTOKYO, FROM MADRID TO OSAKA.
  • 4.
  • 5.
    Back in 2011we started simple System overview: • Started on AWS • PHP frontend and Java backend applications • Built and supported by a small team: 3-4 backend engineers MySQL
  • 6.
    And then westarted expanding rapidly • City-specific environments • Branching the code base • Manually building infrastructure and configuration MySQL PHP Java MySQL PHP Java MySQL PHP Java
  • 7.
    As we grew,getting features out became a challenge We quickly found out that extending monoliths is hard: • Hard to maintain the codebase • Any new feature took weeks to deliver • Hard to scale the dev teams Failure to deliver business value Performance
  • 8.
    Operating a monolithin the cloud got even harder A lot of development and ops time wasted in firefighting: • Lack of automation • Multiple SPOFs • No proper monitoring MySQLMonolith Monolith • Unclear responsibilities • No well defined escalation or ownership process – many non actionable alerts transformed into “all hands on deck” actions
  • 9.
    So in 2013we ended up doing… Monolith
  • 10.
    We wanted tobuild a global platform • Everything had to be automated – any workflow, any action • Everything had to be resilient and self-healing – regardless of the failure source: infrastructure, network or code • Each service had to be responsible for one thing and one thing only Looked at what we did wrong and redesigned it: Key challenges: • We decided to move from PHP & Java to Go • We had to build everything from scratch but move all our production traffic without any downtime • We had to change our culture as we go
  • 11.
    eu-west-1 Message Bus+ API Gateway C* us-east-1 APIGateway C* Message Bus+ Go Service Go Service Go Service Go Service Go Service Go Service Go Service Go Service Go Service Go Service Go Service Go Service
  • 12.
    We started withthe building blocks Logic Storage Library for abstracting service-to-service comms service-layer Handler platform-layer Self-configuring external service adapters A service under the hood: • Service to service communication libs • Discovery • Configuration • A/B testing capabilities • Monitoring & Instrumentation • … and much more Any service gets for free:
  • 13.
    In preparation forthe migration Introducing a smart API Gateway made our life easy: • Let us do a transparent, seamless migration from user perspective • Gave us a lot of flexibility about how we route our traffic • Enabled us to build a lot of failover capabilities API Gateway Monolith
  • 14.
    And then westarted breaking down our monoliths • We aimed to get production traffic on the new platform as quickly as possible • We identified the low hanging fruit first and rewrote them • We kept iterating on our platform and building more tools as we needed API Gateway Decouple
  • 15.
    At present wehave • Microservices ecosystem (99.9% written in Go) • Designed specifically for the cloud – different building blocks and components will constantly be in flux, broken or unavailable • 1000+ AWS instances spanning multiple regions • 200+ services in production
  • 16.
    The Platform Troll aplatform by Swinsto101 / CC BY-SA 3.0 / Desaturated from original
  • 17.
    TeVPC Auto Scaling S3 OrchestrationEnv DNS ReleaseAutoScaling Discovery Monitoring CFEC2 Route 53 Redshift ComputeEIP Routing Core Platform Provisioning Login Services Cloud Provider Whisper Config
  • 18.
    • Lowest levelbuilding blocks • We mostly use basic PaaS components and services as they cover most of our needs • We expect every underlying component to fail and we designed for this TeVPC Auto Scaling S3 CFEC2 Route 53 Redshift Cloud Provider
  • 19.
    eu-west-1 Proxy Layer Message Bus eu-west-1a Services eu-west-1beu-west-1c Shared Infra RabbitMQ RabbitMQ RabbitMQ API API API Go Go Go x many C* NSQ ZK C* NSQ ZK C* NSQ ZK x many x many
  • 20.
    • We useauto scaling groups for everything  Guarantees each component can be rebuilt automatically  Including our database clusters that run on ephemeral storage ( we do keep 6 copies of each piece of data in 2 regions ) • Minimum of 3 AZs in every region • Every workflow is automated • Every component has to be self-healing and scalable Core principles
  • 21.
    • Our “cloudprovider abstraction” layer • Main purpose is infrastructure and workflow automation and discovery • Has a global view of everything happening across our infrastructure • Provides additional capabilities on top of AWS • The only services directly aware of our cloud provider specifics – gives us a lot of flexibility and let us introduce changes quickly OrchestrationEnv DNS Release AutoScalingComputeEIP Whisper
  • 22.
    Everything in ourplatform emits events So naturally we want to capture all external events as well!
  • 23.
    Whisper Service It’s allabout event driven compute – think Lambda but within our platform Events Events Hundreds of publishers & subscribers NSQ Topics Events External sources Actions To subscribe to any new event source we have to only change a single service
  • 24.
    Provides the mostessential platform functions for every service: • Service Discovery • Service Provisioning • Routing & Load Balancing • Authentication/Authorization • Monitoring • Configuration Discovery Monitoring Routing Core Platform Provisioning Login Config
  • 25.
    • Self-contained unitsof execution • Built around business capabilities or domain objects • Small enough to be rewritten in a few days • Independently scalable • They are all about adding business value Services
  • 26.
    All good butdid we make our development any faster?
  • 27.
    Up and runningin seconds Vetted & Tested & Built ~ 100 – 140 sec Setup ~ 1 sec Trigger a build ~ 1 sec
  • 28.
    Deploying a service servicename version auto scaling group
  • 29.
  • 30.
    Service A ServiceA Traffic 5% 95% Smart traffic shaping
  • 32.
    LIVE We can geta completely new service in production in hours Hailo Warp Speed ( measured in many μs or something )
  • 33.
    What about operatingour platform?
  • 34.
  • 35.
    Microservices are allabout the tooling!
  • 36.
  • 37.
  • 39.
    Key Learnings • Automateeverything – it enabled us to do more with less • Identify your KPIs and track them • Invest in tooling: the complexity in a microservices architecture is not in your application code anymore - it is in the thousands of service interactions! • Empowering our engineers increases your velocity tremendously! • Moving to microservices is a journey – make sure you take everyone onboard!
  • 40.

Editor's Notes

  • #3  Seamless user experience
  • #4 From London to Singapore, From Barcelona to Tokyo
  • #18 Cloud