A Microservices Architecture That Emphasizes Rapid Development (That Conference)

284 views

Published on

Slides from That Conference, August 8-10, 2016.

Published in: Software
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
284
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • 5-10 mins for Q at end. Ask via link. Vote.

    Context = important. This presentation is descriptive, not prescriptive.

    Netflix, a huge, global video streaming service, is going to do things differently than a government contract shop; a 1000-person development team will approach problems differently from a 10-person team.

    How many users do we have? What kind of users are they? What are their needs for availability, performance, etc.?

    Don’t get caught up in what’s right and wrong; it’s trade-offs, and those trade-offs depend on who you, what you do, and why you do it.

    Background up front to establish, and talk about things like org structure / team size throughout to help frame how we thought about our problems in our situations at the time.

  • My name’s Rob, engineer at Hudl on our Foundation tribe.
  • Build the platform and lifecycle that helps squads across the rest of the company ship code and products quickly and safely.

    On microservices transition team about 2.5 years ago.
  • At hudl, we build sports software that helps teams and athletes win.

    Teams record video at their games and practices.
  • They upload that video to hudl.com,
    we provide tools to help them break down and analyze the video they record.
    share it with coaches, athletes, and analysts…
  • who use it to better understand their game,
    build reports to identify tendencies and areas to improve in the future.
  • Highlights for athletes and teams to showcase and share.
  • Notification and event feeds for team communication...
  • Playbooks that integrate with video

    Significant number of other features.
  • Our customers. These are the people we solve problems for.
  • 4.5MM active users in dozens of sports in over 70 countries
    130k teams from small, local basketball league teams all the way up to elite teams in leagues like the NFL, EPL
    10k reqs/second during american football season
    Mainly C# / JS / MongoDB - Running on IIS on windows server, a number of frameworks (ASP.NET MVC, React). A lot of supplementary tech as well for cache, queue, etc.
    All cloud, AWS. No on-premises hosting or datacenters.
  • Small, cross-cutting teams that focus on a particular domain.
    Developers, designers, QA, Product Manager pave their own way and ship at their own speed.
    Volleyball squad that figures out what they need to do to solve customers’ problems.
    Most squads ship early and often. MVP: playbook @ 400 deploys and counting
    Anyone can deploy; in fact, most often our QA or PMs are the ones pushing code to production
    Deploys: zero downtime, fast and easy; moving fast comes with some risk, and the ability to react to that quickly is important
    No gatekeepers or throwing-over-the-wall; squads operate their services;
    If memory leak, they’re the ones remoting on and digging into it

  • We measure something called Branch Lifecycle. Branch is the atomic unit of “shipping” for us, want to know how long it takes a branch to get from creation to production.

    Histogram breaking down that lifecycle duration for all the branches our teams have shipped in the last 30 days.
    The blue bars are histogram buckets for the first 24 hours, and the green bars are buckets for each day afterward.
    Example: Just over 60 branches have gone from created to shipped in less than one hour
    Median lifecycle 32 hours; half of the team’s branches go out in < 1.5 days

    My goal is to make sure we have platform that enables a wide spectrum here. Don’t force the fast iterations, but enable them.

    TRANSITION: For us, arch/plat/lifecycle is a means to keep us nimble as a product team so we can deliver fixes and features to our coaches and athletes quickly. Transition to microservices was one thing that helped us achieve that as we’ve grown our team.
  • Ask question
    Interested in talking specifics about your architecture and how you think about microservices, find me afterward and I’d love to chat more about it
  • Transition: don’t take microservices lightly
  • Architectural Complexity

    Need teams to build these and support them if they break. You need test and local development environments that can work with this model.
    Your company’s business is probably not to “build a microservices architecture.” Time spent away from building products for your customers, and doing the things that actually make your business money.

    Runtime complexity, also. Change the way you think about writing code to be more tolerant of individual service failures. Build your application differently.

    The universe you have to invent and live in for microservices is much more complex. Don’t take it lightly.
  • There are existing tools. Buy or OSS. There are smaller pieces that you can build with, or complete PaaS offerings. Read, read, read.

    I will be prescriptive here. You shouldn’t try to invent the universe yourself. Don’t go writing a strongly-consistent service registry yourself. Solved problems, leverage them.

    C# + Windows made this challenging for us, which is why we hand-rolled some.
    Custom deployment was one of those; IIS + Windows, and Windows servers are slower to start, so rolling deploys made our deploy process slower; swap out deployment payloads on existing instances

    Know the trade-offs. How much control do you need over the pieces? How complex is it to operate?
  • TRANSITION: Tons of advantages and disadvantage between. Highlight a couple that are important to us relative to dev speed.
  • We do this a lot. Anyone can make changes to others’ codebases.

    I’ve done a lot of development on our users service (manages users and authentication) but team mgmt may want to make a change to our role management.

    Awesome! Work I don’t have to do. No work order, wait for backlog, hope it was to spec.

    Code reviews. Probably different from some orgs that have more rigid ownership.

    Monolith Advantage
  • Recruit - Athletes play at the next level of their career, college recruiting programs
    Profiles - Public athlete pages, showcase highlights and other strengths

    Profiles squad new feature, show colleges an athlete is being recruited to on their profile.

    Add some data and query it a bit differently on the Recruit side.

    Example: recruiting college, display on athlete profile page
  • Today, these two domains are microservices for us.

    Data layer changes on recruit
    Expose via API. Contract, versioning, backwards compatibility
    Get that code pushed first
    Make changes on profiles, now it’s an API call over the network. Data transport, service discovery. Have to consider failure, set timeouts.
    Render to view.
    If bug found during testing, especially on the recruit changes, do that dance all over again.

    Monoliths have advantage, much simpler, much faster to code.
  • Describe old monolith deploy lifecycle

    For us, 30m each. If problems or rollback, longer. Blocks everyone.
  • Loosely coupled and independently deployable, many queues

    Also “Blast radius”, problem with deployment or code is limited.

    TRANSITION: These two concepts (cross-domain dev, deploy orchestration) are both important to us.
    Want to allow teams to have the flexibility to quickly make changes across domains
    Want to be able to get code out to production quickly when it’s ready to go

    Second one, having to queue up for deploys, was one of the biggest catalysts for our transition



  • Code deploys / week since 2012
    Test deploys, prod deploys
    Orange line, max @ 90/week (18/day)
    Vertical line, first micro in Jan 2014

    Why was that becoming more of a problem for us at that time?
  • Had been slowly growing our development team. Blue line is product team size.

    From a small team of maybe a dozen developers or so to a slightly larger team 46 contributors to the same code base. 6-7 squads, each working on a different domain

    Knew we were going to hire a lot in the coming years, so it was only going to get worse.

    Monolith for 6-7 years. Had reorganized our teams to the point where the underlying arch was fighting organizational structure.
  • Remnant of a smaller company with a single team split among tasks in the same website. Narrower customer focus (US HS FB coaches and athletes).
  • To follow Conway, we’d need something like this.

    Break it up, let teams create and work on their services.

    Pause, transition.

    Go into some detail on these blue boxes, these microservices that our teams work on today. Describe the architecture, and a couple things we do to let team work on them quickly.
  • Describe speedtest, coach runs, we persist, load in admin.

    Support asks where uploading from, suggests alternatives
  • Full-stack microservices, everything a squad needs to build an application end-to-end and deliver it as a service.

    ASP.NET MVC
    Serves static resources for client-side apps, views, APIs, and inter-service endpoints.
    Data layer

    Stateless at runtime, nothing that can’t be reconstructed quickly on startup. State and data is stored in databases like MongoDB, caches, external queueing systems, etc.
    Easier to manage lifecycle in production.

  • NGINX in front. Takes incoming web requests and does routing and load balancing.

    Can autoscale clusters independently based on load or seasonality
  • 35 different clusters, each covering a domain. One of these is still our monolith, still co-existing alongside all of the newer microservices we create.
  • Run several NGINX nodes for capacity and fault tolerance.

    Amazon’s Elastic Load Balancer in front

    In triplicate, one in each of three amazon AZs
  • NGINX for both load balancing and smart routing

    It looks at the requested path (/speedtest) and needs to figure out where to send it, because only a few servers specific to that application can field the route.
    Routes coded into service itself
  • Easy for devs to add new routes, it’s coded right into their app
  • Isolated mongo, admin page doesn’t reach into speedtest mongodb. Contract.

    Describe diagram

    Mitigate the pain that’s introduced into the cross-domain development workflows.

    For us, important to make this easy for developers, so they don’t have to re-invent service discovery, serialization, and load balancing.
  • On the Speedtest side, publish an interface with service methods.
    Automatically build a Nuget package that contains the interfaces and types - lightweight contract package
  • If our admin service wants to make this call, they import the client package
    Use it as a Type passed to a ServiceLocator that we’ve written.
    ServiceLocator intercepts the call
    uses attributes on the interface to help locate the service with its internal route table built from Eureka
    inject an HTTP call and make it over to speedtest
    deserializes the result back into the DTO

    Looks RPC-like, but the interceptor does JSON over HTTP.

    Lets us solve a lot of problems for the caller: Retries, Load Balancing, Health Checks; patterns like Circuit Breaker

    Abstracted away the complexity and lets teams get to work on the important stuff: solving problems for our users and not solving problems specific to the architecture they’re building on.
  • Transition Microservices have been pretty helpful for us in terms of scalability - I mentioned earlier how we can scale services independently, which is really convenient.

    However, when it comes to scalability, in our experience, microservices have been much more beneficial for scaling our organization and teams, and not as much about scaling the application itself..

    Not downplaying other adv. of microservices. In terms of scalability, app scalability (# users, amount of data) is a solvable problem in most all archs. SO as example. Code optimization, hardware, tuning.

    670MM pageviews / month on a small, powerful setup. 9 webservers, a few different code bases
    9 webservers, 1/1 HAproxy, 2/2 SQL,

    http://stackexchange.com/performance

    Microservices let us add more teams or restructure the teams we have, and allows our architecture flow with that restructuring
  • Inspired by Spotify. Organized into tribes; 4 primary tribes, about 50 people average per.
    Tribes are a large business unit. Have a tribe focused on media and fan-facing content. Another focused on Coaching Tools for all sports.
    Tribe is composed of squads, cross-cutting teams with dev/QA/PM/design. CT tribe has Football, Basketball, Volleyball

    We re-organize fairly frequently. New business opportunities, or shifting business focus. Solve different problems or build new products.
    In practice, because re-org squads frequently, services don’t really line up 1:1. Temporal ownership.
    Squads tend to own their services while they’re devving and releasing on them. Inevitable that we shift squads.
    Tribes own ops duty and alerting. Each has an on-call rotation they manage.
    Responsible for making sure they’re staying up to date with new library versions

    Organizational restructuring loosely-coupled relationship between squads and services,
    and is also one of the primary reasons that new microservices are introduced

    A few other reasons that new services get created.
  • What causes introduction of new services?
    Structural/Direction:
    Reorganization (basketball -> split)
    Often implies new product development by existing squad (conversion -> getpaid [new signup])
    Shifting project focus on a squad (platform -> users)
    Adding/replacing new functionality to an existing service (recruit + recruitsearch)

    Reactionary, often because of service getting too large:
    Deploy queue
    Build times
    Not wanting to work in the monolith (10k files, 2MM lines)

    Targeted migration; still have monolith that handles a good number of domains; if we feel like that’s a risk as our product changes, we’ll migrate code/data out.
  • Great example that combines several of those reasons, both reorganization and reactionary,
    can be seen in the way we’ve built our basketball product over the last couple years.

    Deploys/week, basketball. Introduce video, reports, library, record.

    That service grew quite large. Jokinly a “minilith” or “microlith”.

    If we had known at the start that we’d have split off several services, would we have started? I don’t think we woudl have.

    Architecting multiple services up front wouldn’t have let us prove or disprove that product as quickly as we were able to.
    Letting it grow larger let the team leverage faster cross-domain development
  • Basketball, not “micro” for sure, and that’s okay with us.

    Loose - independent and isolated
    Service - communicate by contract
    Bounded - understood, intentionally scoped domain

    We don’t prescribe a max size for services, and have a range of differently-sized services.
  • Speedtest / Users / Recruit

    Mega services, still an intentional scope or domain.
  • “Monolith” gets a bad rap, and the word “micro” doesn’t need to be the emphasis when talking microservices;
    services can grow large and still adhere to these principles.

    That lets leverage the benefits of microservices while getting some of the development speed strengths of the monolith when it comes to MVPing, experimentation, and cross-domain development.
  • Parity
    Differences between the way our prod and test/dev environments are set up. Pulsar + local NGINX, co-locate apps on test servers to save cost.
    A lot of maintenance and operational overhead in test, and bringing closer parity with production will be part of our next architectural iteration.

    Currently planning out the next evolution:
    .NET Core
    Containers
    Cross-Platform - devs with macs, parallels

    .NET Core should be a game-changer for us. Simplify deployment and let us run apps on Linux, which will be cheaper and let us run them more easily in containers.

    Conclusion
    To wrap up, what’s helped us the most has been having a microservices platform that’s flexible enough to follow our organizational structure, without getting too caught up on the “micro” prefix. It enables a fast development lifecycle and has helped us effectively scale our team from 50 to 200 product team, and should sets us up to grow our team even more moving forward.
  • A Microservices Architecture That Emphasizes Rapid Development (That Conference)

    1. 1. A Microservices Architecture That Emphasizes Rapid Development
    2. 2. @robhruska
    3. 3. @HudlEngineering
    4. 4. Customers ○Coaches ○Athletes ○Recruiters ○Video Coordinators ○Analysts ○Parents ○Alumni ○Fans
    5. 5. 4.5MM active users 130K teams 10K reqs/second C# / JavaScript / MongoDB Amazon Web Services
    6. 6. Culturally Fast / Rapid ○Small cross-cutting squads ○Ship early, ship often; MVP ○Anyone can deploy, anytime ○Deploys, rollbacks are fast and easy ○Low friction for service operation
    7. 7. 2 3 41h 2 3 6 1d12 18 5 6 7 14 21 22+ 10 20 30 40 50 60 Branch Lifecycle Duration
    8. 8. Exploring ► Implementing ► Improving
    9. 9. If you wish to make an apple pie from scratch, you must first invent the universe. Carl Sagan “
    10. 10. Problems to Solve □ Configuration □ Deployment □ Routing □ Service Discovery □ Dev. Lifecycle □ Monitoring / Tracing □ Logging □ Testing □ Security □ ...
    11. 11. Monoliths vs. Microservices
    12. 12. Monoliths vs. Microservices Cross-Domain Development
    13. 13. Monoliths vs. Microservices Deployment Workflows
    14. 14. 800 100 Deploys / Week by Environment
    15. 15. ~ 46 contributors Total # Product Team Members / Week
    16. 16. organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations Melvin Conway “
    17. 17. public class RouteConfig : IRouteConfig { public IEnumerable<string> GetApplicationRoutes() { return new List<string> { "speedtest.*", "api/v2/speedtest/.*", "bifrost/speedtest/.*", "scripts/speedtest/.*", "bundles/speedtest/.*", "css/speedtest/speedtest.css", }; }
    18. 18. [BifrostService] public interface ISpeedTestResultService { [Path("speedtest-result/get-latest-result-for-user")] [Idempotent] Task<SpeedTestOverallResultDto> GetLatestResultForUser( string userId, CancellationToken? token = null); } }
    19. 19. using Hudl.Bifrost.Location; using Hudl.SpeedTest.Client.Services; var result = await ServiceLocator.Get<ISpeedTestResultService>() .GetLatestResultForUser(userId);
    20. 20. Architecture Flows With Organizational Structure
    21. 21. New Service Introduction Team Reorganization ○ New squads / business focus ○ Shifting focus / domain ○ Replacing functionality Reactionary ○ Deploy queue ○ Build times ○ Code size ○ Targeted migration
    22. 22. Deploys / Week, Basketball
    23. 23. Microservices Loosely coupled Service oriented Bounded contexts
    24. 24. speedtest 14 files < 1000 LOC 1 Page 0 APIs 1 Endpoint users recruit 222 files 44k LOC 3 Pages 8 APIs 78 Endpoints 1900 Files 400k LOC Dozens of pages & APIs 64 Endpoints
    25. 25. Tiny Small Medium Large Mega # Services by Size
    26. 26. Microservices Loosely coupled Service oriented Bounded contexts
    27. 27. Lessons Learned & The Future
    28. 28. Thanks! @robhruska robhruska.com github.com/robhruska @HudlEngineering hudl.github.io public.hudl.com/bits/

    ×