Context = important. This presentation is descriptive, not prescriptive.
Netflix, a huge, global video streaming service, is going to do things differently than a government contract shop; a 1000-person development team will approach problems differently from a 10-person team.
How many users do we have? What kind of users are they? What are their needs for availability, performance, etc.?
Don’t get caught up in what’s right and wrong; it’s trade-offs, and those trade-offs depend on who you, what you do, and why you do it.
Background up front to establish, and talk about things like org structure / team size throughout to help frame how we thought about our problems in our situations at the time.
My name’s Rob, engineer at Hudl on our Foundation tribe.
Build the platform and lifecycle that helps squads across the rest of the company ship code and products quickly and safely.
On microservices transition team about 2.5 years ago.
At hudl, we build sports software that helps teams and athletes win.
Teams record video at their games and practices.
They upload that video to hudl.com, we provide tools to help them break down and analyze the video they record. share it with coaches, athletes, and analysts…
who use it to better understand their game, build reports to identify tendencies and areas to improve in the future.
Highlights for athletes and teams to showcase and share.
Notification and event feeds for team communication...
Playbooks that integrate with video
Significant number of other features.
Our customers. These are the people we solve problems for.
4.5MM active users in dozens of sports in over 70 countries 130k teams from small, local basketball league teams all the way up to elite teams in leagues like the NFL, EPL 10k reqs/second during american football season Mainly C# / JS / MongoDB - Running on IIS on windows server, a number of frameworks (ASP.NET MVC, React). A lot of supplementary tech as well for cache, queue, etc. All cloud, AWS. No on-premises hosting or datacenters.
Small, cross-cutting teams that focus on a particular domain. Developers, designers, QA, Product Manager pave their own way and ship at their own speed. Volleyball squad that figures out what they need to do to solve customers’ problems. Most squads ship early and often. MVP: playbook @ 400 deploys and counting Anyone can deploy; in fact, most often our QA or PMs are the ones pushing code to production Deploys: zero downtime, fast and easy; moving fast comes with some risk, and the ability to react to that quickly is important No gatekeepers or throwing-over-the-wall; squads operate their services; If memory leak, they’re the ones remoting on and digging into it
We measure something called Branch Lifecycle. Branch is the atomic unit of “shipping” for us, want to know how long it takes a branch to get from creation to production.
Histogram breaking down that lifecycle duration for all the branches our teams have shipped in the last 30 days. The blue bars are histogram buckets for the first 24 hours, and the green bars are buckets for each day afterward. Example: Just over 60 branches have gone from created to shipped in less than one hour Median lifecycle 32 hours; half of the team’s branches go out in < 1.5 days
My goal is to make sure we have platform that enables a wide spectrum here. Don’t force the fast iterations, but enable them.
TRANSITION: For us, arch/plat/lifecycle is a means to keep us nimble as a product team so we can deliver fixes and features to our coaches and athletes quickly. Transition to microservices was one thing that helped us achieve that as we’ve grown our team.
Ask question Interested in talking specifics about your architecture and how you think about microservices, find me afterward and I’d love to chat more about it
Transition: don’t take microservices lightly
Need teams to build these and support them if they break. You need test and local development environments that can work with this model. Your company’s business is probably not to “build a microservices architecture.” Time spent away from building products for your customers, and doing the things that actually make your business money.
Runtime complexity, also. Change the way you think about writing code to be more tolerant of individual service failures. Build your application differently.
The universe you have to invent and live in for microservices is much more complex. Don’t take it lightly.
There are existing tools. Buy or OSS. There are smaller pieces that you can build with, or complete PaaS offerings. Read, read, read.
I will be prescriptive here. You shouldn’t try to invent the universe yourself. Don’t go writing a strongly-consistent service registry yourself. Solved problems, leverage them.
C# + Windows made this challenging for us, which is why we hand-rolled some. Custom deployment was one of those; IIS + Windows, and Windows servers are slower to start, so rolling deploys made our deploy process slower; swap out deployment payloads on existing instances
Know the trade-offs. How much control do you need over the pieces? How complex is it to operate?
TRANSITION: Tons of advantages and disadvantage between. Highlight a couple that are important to us relative to dev speed.
We do this a lot. Anyone can make changes to others’ codebases.
I’ve done a lot of development on our users service (manages users and authentication) but team mgmt may want to make a change to our role management.
Awesome! Work I don’t have to do. No work order, wait for backlog, hope it was to spec.
Code reviews. Probably different from some orgs that have more rigid ownership.
Recruit - Athletes play at the next level of their career, college recruiting programs Profiles - Public athlete pages, showcase highlights and other strengths
Profiles squad new feature, show colleges an athlete is being recruited to on their profile.
Add some data and query it a bit differently on the Recruit side.
Example: recruiting college, display on athlete profile page
Today, these two domains are microservices for us.
Data layer changes on recruit Expose via API. Contract, versioning, backwards compatibility Get that code pushed first Make changes on profiles, now it’s an API call over the network. Data transport, service discovery. Have to consider failure, set timeouts. Render to view. If bug found during testing, especially on the recruit changes, do that dance all over again.
Monoliths have advantage, much simpler, much faster to code.
Describe old monolith deploy lifecycle
For us, 30m each. If problems or rollback, longer. Blocks everyone.
Loosely coupled and independently deployable, many queues
Also “Blast radius”, problem with deployment or code is limited.
TRANSITION: These two concepts (cross-domain dev, deploy orchestration) are both important to us. Want to allow teams to have the flexibility to quickly make changes across domains Want to be able to get code out to production quickly when it’s ready to go
Second one, having to queue up for deploys, was one of the biggest catalysts for our transition
Code deploys / week since 2012 Test deploys, prod deploys Orange line, max @ 90/week (18/day) Vertical line, first micro in Jan 2014
Why was that becoming more of a problem for us at that time?
Had been slowly growing our development team. Blue line is product team size.
From a small team of maybe a dozen developers or so to a slightly larger team 46 contributors to the same code base. 6-7 squads, each working on a different domain
Knew we were going to hire a lot in the coming years, so it was only going to get worse.
Monolith for 6-7 years. Had reorganized our teams to the point where the underlying arch was fighting organizational structure.
Remnant of a smaller company with a single team split among tasks in the same website. Narrower customer focus (US HS FB coaches and athletes).
To follow Conway, we’d need something like this.
Break it up, let teams create and work on their services.
Go into some detail on these blue boxes, these microservices that our teams work on today. Describe the architecture, and a couple things we do to let team work on them quickly.
Describe speedtest, coach runs, we persist, load in admin.
Support asks where uploading from, suggests alternatives
Full-stack microservices, everything a squad needs to build an application end-to-end and deliver it as a service.
ASP.NET MVC Serves static resources for client-side apps, views, APIs, and inter-service endpoints. Data layer
Stateless at runtime, nothing that can’t be reconstructed quickly on startup. State and data is stored in databases like MongoDB, caches, external queueing systems, etc. Easier to manage lifecycle in production.
NGINX in front. Takes incoming web requests and does routing and load balancing.
Can autoscale clusters independently based on load or seasonality
35 different clusters, each covering a domain. One of these is still our monolith, still co-existing alongside all of the newer microservices we create.
Run several NGINX nodes for capacity and fault tolerance.
Amazon’s Elastic Load Balancer in front
In triplicate, one in each of three amazon AZs
NGINX for both load balancing and smart routing
It looks at the requested path (/speedtest) and needs to figure out where to send it, because only a few servers specific to that application can field the route. Routes coded into service itself
Easy for devs to add new routes, it’s coded right into their app
Isolated mongo, admin page doesn’t reach into speedtest mongodb. Contract.
Mitigate the pain that’s introduced into the cross-domain development workflows.
For us, important to make this easy for developers, so they don’t have to re-invent service discovery, serialization, and load balancing.
On the Speedtest side, publish an interface with service methods. Automatically build a Nuget package that contains the interfaces and types - lightweight contract package
If our admin service wants to make this call, they import the client package Use it as a Type passed to a ServiceLocator that we’ve written. ServiceLocator intercepts the call uses attributes on the interface to help locate the service with its internal route table built from Eureka inject an HTTP call and make it over to speedtest deserializes the result back into the DTO
Looks RPC-like, but the interceptor does JSON over HTTP.
Lets us solve a lot of problems for the caller: Retries, Load Balancing, Health Checks; patterns like Circuit Breaker
Abstracted away the complexity and lets teams get to work on the important stuff: solving problems for our users and not solving problems specific to the architecture they’re building on.
Transition Microservices have been pretty helpful for us in terms of scalability - I mentioned earlier how we can scale services independently, which is really convenient.
However, when it comes to scalability, in our experience, microservices have been much more beneficial for scaling our organization and teams, and not as much about scaling the application itself..
Not downplaying other adv. of microservices. In terms of scalability, app scalability (# users, amount of data) is a solvable problem in most all archs. SO as example. Code optimization, hardware, tuning.
670MM pageviews / month on a small, powerful setup. 9 webservers, a few different code bases 9 webservers, 1/1 HAproxy, 2/2 SQL,
Microservices let us add more teams or restructure the teams we have, and allows our architecture flow with that restructuring
Inspired by Spotify. Organized into tribes; 4 primary tribes, about 50 people average per. Tribes are a large business unit. Have a tribe focused on media and fan-facing content. Another focused on Coaching Tools for all sports. Tribe is composed of squads, cross-cutting teams with dev/QA/PM/design. CT tribe has Football, Basketball, Volleyball
We re-organize fairly frequently. New business opportunities, or shifting business focus. Solve different problems or build new products. In practice, because re-org squads frequently, services don’t really line up 1:1. Temporal ownership. Squads tend to own their services while they’re devving and releasing on them. Inevitable that we shift squads. Tribes own ops duty and alerting. Each has an on-call rotation they manage. Responsible for making sure they’re staying up to date with new library versions
Organizational restructuring loosely-coupled relationship between squads and services, and is also one of the primary reasons that new microservices are introduced
A few other reasons that new services get created.
What causes introduction of new services? Structural/Direction: Reorganization (basketball -> split) Often implies new product development by existing squad (conversion -> getpaid [new signup]) Shifting project focus on a squad (platform -> users) Adding/replacing new functionality to an existing service (recruit + recruitsearch)
Reactionary, often because of service getting too large: Deploy queue Build times Not wanting to work in the monolith (10k files, 2MM lines)
Targeted migration; still have monolith that handles a good number of domains; if we feel like that’s a risk as our product changes, we’ll migrate code/data out.
Great example that combines several of those reasons, both reorganization and reactionary, can be seen in the way we’ve built our basketball product over the last couple years.
That service grew quite large. Jokinly a “minilith” or “microlith”.
If we had known at the start that we’d have split off several services, would we have started? I don’t think we woudl have.
Architecting multiple services up front wouldn’t have let us prove or disprove that product as quickly as we were able to. Letting it grow larger let the team leverage faster cross-domain development
Basketball, not “micro” for sure, and that’s okay with us.
Loose - independent and isolated Service - communicate by contract Bounded - understood, intentionally scoped domain
We don’t prescribe a max size for services, and have a range of differently-sized services.
Speedtest / Users / Recruit
Mega services, still an intentional scope or domain.
“Monolith” gets a bad rap, and the word “micro” doesn’t need to be the emphasis when talking microservices; services can grow large and still adhere to these principles.
That lets leverage the benefits of microservices while getting some of the development speed strengths of the monolith when it comes to MVPing, experimentation, and cross-domain development.
Parity Differences between the way our prod and test/dev environments are set up. Pulsar + local NGINX, co-locate apps on test servers to save cost. A lot of maintenance and operational overhead in test, and bringing closer parity with production will be part of our next architectural iteration.
Currently planning out the next evolution: .NET Core Containers Cross-Platform - devs with macs, parallels
.NET Core should be a game-changer for us. Simplify deployment and let us run apps on Linux, which will be cheaper and let us run them more easily in containers.
Conclusion To wrap up, what’s helped us the most has been having a microservices platform that’s flexible enough to follow our organizational structure, without getting too caught up on the “micro” prefix. It enables a fast development lifecycle and has helped us effectively scale our team from 50 to 200 product team, and should sets us up to grow our team even more moving forward.
A Microservices Architecture That Emphasizes Rapid Development (That Conference)
A Microservices Architecture
That Emphasizes Rapid Development