Gilt.com (founded 2007) started out as a monolithic Rails application built and maintained by just a handful of engineers. Gilt has since become a global e-commerce destination built upon a sophisticated Scala/Java micro-services architecture strong enough to handle the company’s intense traffic spikes--generated by millions of the company’s members simultaneously visiting the site at noon each day. The concept of micro-services--smaller, lighter, faster components delivered swiftly to production--has resolved many of the growing pains we experienced evolving from “just a handful of engineers” into an engineering team of +100. However, microservices bring their own set of tradeoffs. In this talk I'll discuss the evolution of Gilt's micro-service architecture, and the challenges we now face today. How do you move a large Micro-Service deployment to the cloud? How does micro-services affect ownership and software quality? Why do API’s really matter? How can the micro-service approach applied to front-end web applications? And, why has our adoption of micro-services lead us to the conclusion that all our teams should prefer to deploy and test software in production?
11. 2011: Boo: we have a
monolith! Maybe these
micro-services can help
us move faster!
2012: This is
AMAZING!
Gilt’s Microservice
Hype Cycle
2013 :Look at all
these services!
13. 2011: Boo: we have a
monolith! Maybe these
micro-services can help
us move faster!
2012: This is
AMAZING!
2013: Look at all
these services!
2014: Holy cr&p,
what have we done?
Look at ALL these
services
2015: Let’s get
a handle on§
this
Gilt’s Microservice
Hype Cycle
2016: Ah, the sweet
taste of awesome
sauce.
19. driving forces behind gilt’s emergent architecture
● team autonomy
● voluntary adoption (tools, techniques, processes)
● kpi or goal-driven initiatives
● failing fast and openly
● open and honest, even when it’s difficult
26. Lift-and-shift + elastic teams
Existing Data Centre
Dual 10Gb direct connect line, 2ms latency.
‘Legacy VPC’
MobileCommon
Person-
alisation
Admin Data
(1) Deploy to VPC
(2) ‘Department’ accounts for elasticity & devops
32. Lessen dependencies
between teams: faster code-
to-prod
Lots of initiatives in parallel
Your favourite
<tech/language/framework>
here
We (heart) μ-services
Graceful degradation of
service
Disposable Code: easy to
innovate, easy to fail and
move on.
33. We (heart) cloud
Do devops in a
meaningful way.
Low barrier of entry for
new tech (dynamoDB,
Kinesis, ...)
Isolation
Cost visibility
Security tools (IAM)
Well documented
Resilience is easy
Hybrid is easy
Performance is great
34. Lessons from the Slope:
1. µservice architecture is emergent
2. manage ownership & risk
3. make your clients thin
4. avoid snowflakes
5. test in production where possible
37. It’s hard to think of architecture in one dimension.
n = 265, where n is the number of services.
38. … we used a “spread sheet”.
‘The Gilt Genome Project’
39. It’s hard to think of architecture in one dimension.
We added ‘Functional Area’, ‘System’ and ‘Subsystem’ columns to Gilt Genome;
provides a strong (although subjective) taxonomy.
It turns out we have an elegant, emergent architecture.
Some services / components are deceptively simple.
Others are simply deceptive, and require knowledge of their surrounding
‘constellation’
n = 265, where n is the number of services.
42. Gilt Admin (Legacy Ruby on Rails Application)
City
Discounts
Financial
Reporting
Fraud Mgmt
Gift Cards
Inventory
Mgmt
Order Mgmt
Sales Mgmt
Product
Catalog
Purchase
Orders
Targetting
Billing
Other Admin Applications (Scala + Play Framework)*
City Creative (2) CS
Discounts Distribution i18n Inventory (2)
Order
Processing
(2)
Util
Service Constellations (Scala, Java)*
Auth (1) Billing (1) City (6) Creative (4) CS (2) Discounts (1)
Distribution
(9)
i18n (3) inventory (6)
Order
Processing
(8)
Payments (3)
Product
Catalog (5)
Referrals (1) Util (2)
Core Database - ‘db3’
Job System (Java, Ruby)
Gilt Logical Architecture - Back Office Systems
* counts denote number of service / app components.
Simply deceptive:
service context only
make sense in
constellation.
43. Emergent Architecture:
Using the three-level taxonomy approach, we’ve been able to get a better
understanding of an emergent architecture, at a department level, and where the
complexity lies.
We’ve also concluded that the department is the right level of granularity for
consensus on technical decisions (language, framework, …)
Gilt’s Architecture Board set’s the overall standards that teams must follow when
interacting across departmental boundaries. HTTP. REST. DNS. AWS.
45. 1. Software is owned by
departments, tracked in
‘genome project’. Directors
assign services to teams.
2. Teams are responsible for
building & running their
services; directors are
accountable for their overall
estate.
bottom-up ownership, RACI-style
46. Notes:
Zero Power, High Influence: The Architecture Board https://github.com/gilt/arch-board
Gilt Standards and Recommendations: https://github.com/gilt/standards
49. 30%
Amount of time a department should spend on operations / maintenance / red-hot.
We build the notion of SRE (Site Reliability Engineering) into the team.
51. Getting a handle on ownership...
Jul 2015 Sep 2015
Oct 2015 Feb 2015
52. Emergent Architecture + Ownership Oriented Org:
“You just pulled an inverse Conway manoeuvre”
Back-Office Personalisation Mobile
Web & Core
Services
Back-Office Personalisation Mobile
Web & Core
Services
Architectural
Area
Department
54. Consumer
Dependencies
Consumer
Repo
Take as few code dependencies as possible. This stuff HURTS when n ~= 300.
Service Code
Common Code
Client Code
Service
Repo
Service
Dependencies
Client
JAR
Dependency hell as client JAR
dependencies conflicts with service
dependencies.
X
55. This is way easier. http://apidoc.me
<<apidoc>>
Service API
Service Code
Service
Repo Service
Dependencies
Consumer
Dependencies
Consumer
Repo
apidoc: define RESTful service
API agnostically and generate
dependency free, thin clients.
Client
Code
<< generate>>
Service
Stub
<< generate>>
58. 6
Andrey’s Rule of Six:
“We could solve this now, or, just wait six months, and Amazon will provide a
solution”
Andrey Kartashov, Distinguished Engineer, Gilt.
59. Current thinking on deployment:
(1) Re-use as much AWS tooling as possible:
Code Pipeline, Code Deploy, Cloud
Formation.
(2) Very lightweight tool chain to support dark
canaries, canary releases, phased roll-out and
roll-back: NOVA
https://github.com/gilt/nova
61. Testing and TiP
Maintaining stage environments in a micro-service architecture is HARD.
Prefer to test in production where possible: use dark canaries, canaries, staged
roll-out and roll-back.
Invest in monitoring and alerting over hard-to-maintain test pipelines.
Where teams need a stage environment, let them build a minimal environment,
and manage it themselves.
Estimate: about 85% of Gilt’s teams use TiP techniques; 15% need a stage
environment.
62. Lessons from the Slope:
1. µservice architecture is emergent
2. manage ownership & risk
3. make your clients thin
4. avoid snowflakes
5. test in production where possible