Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Microservices at Mercari

13,616 views

Published on

2017-09-28 thu.
第 1 回 Google Cloud INSIDE Games & Apps

株式会社メルカリ SRE 中島 大一 氏の登壇スライドです。

Published in: Technology
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Microservices at Mercari

  1. 1. Microservices at Mercari Current status and challenges
  2. 2. Taichi Nakashima (@deeeet/@tcnksm) SRE at Mercari, automation obsessed, gopher
  3. 3. SRE mission at Mercari ● To ensure a reliable service that is enjoyable to use at anytime ● Takes care of all engineering apart from new service development ○ Performance improvement, automation, security etc
  4. 4. Current Mercari architecture nginx HTTP API API API MySQL MySQL solr solr solr Cache Simple 3 tiler + α architecture Single code base
  5. 5. Current Mercari architecture Same architecture In 3 region JP US UK
  6. 6. Positive ● A central ops team (SRE) can efficiently handle
  7. 7. Challenges
  8. 8. Challenges nginx HTTP API API API MySQL MySQL solr solr solr Cache Simple 3 tiler + α architecture Single code base
  9. 9. Challenges nginx HTTP API API API MySQL MySQL solr solr solr Cache Simple 3 tiler + α architecture Monolith?
  10. 10. Challenges ● Code is too huge/complex to understand ● Team is too large to efficiently work on shared code base ● Communication overhead is too large ● Velocity (development cycle) is stalled...
  11. 11. Microservices
  12. 12. Microservices? ● Architectural and organizational approach to software development ○ To speed up deployment cycles ○ Foster innovation and ownership ○ Improve maitainability and scalability
  13. 13. Microservices? $ cat inside.txt | cut -f 1 -d ' ' | sort | uniq -c | sort -nr
  14. 14. Microservices ● Do one thing well ○ Unix philosophy ○ One function in one service, not multiple functions in one service ● Decentralized Governance ○ Each team has ownership on each service ● Independent ○ Each service can be changed, upgraded, or replaced independently ● Polyglot ○ Right framework and tool for each domain
  15. 15. Goal ● Software Engineer ○ Without velocity stalled, rather make feature improvement iteration speed fast ○ -> Provide great features to customers faster ● SRE ○ Provide automated platform for microservice ○ Give some responsibility (e.g., deployment, debug) to software engineering ○ -> Focus on more SRE related software engineering task
  16. 16. Team @deeeet @spensnova @babarot
  17. 17. State of microservices in US
  18. 18. Microservices architecture in US Mercari API HTTP
  19. 19. Microservices architecture in US Gateway API Mercari API HTTP HTTP
  20. 20. Microservices architecture in US Gateway API Mercari API HTTP offer HTTP gRPC
  21. 21. Microservices architecture in US Gateway API Mercari API HTTP search offer HTTP gRPC
  22. 22. Microservices architecture in US Gateway API Mercari API HTTP search personalization offer HTTP gRPC
  23. 23. Technical stacks ● Docker ● Kubernetes (Google Container Engine) ● gRPC
  24. 24. Container ● Resource isolation ● Resource limitation ● Fast boot (vs. VM) Docker ● Easy to build container image ● Easy to distribute via registry
  25. 25. Why Docker? ● Software engineer control more ○ They can include what they want (e.g., runtime, library) ● Environmental parity ○ What works on local development (or QA env) is exact same (easy to debug) ○ No more “it works on my environment but not in production!” ● Easy to deploy ○ Docker image ≒ Single static linked binary ○ You already know its benefit if you use Go
  26. 26. Kubernetes (GKE) ● Container orchestration ● Derives from Google internal system named Borg & Omega ● Inspired and informed by Google’s experiences and internal systems
  27. 27. Why kubernetes? ● Best way to maximize container benefit ○ Resource isolation/limitation enables us compute resource utilization. But how? ■ K8s can correctly schedule container proper instances ○ How to communicate between dynamically scheduled containers? ■ K8s provide the service discovery ● Reduce operation costs ○ Self healing & auto scaling ● Infrastructure of infrastructure ○ Industrial standard https://githubengineering.com/kubernetes-at-github ○ More tools/software comes top on k8s in future (I guess)
  28. 28. gRPC ● gRPC Remote Procedure Call ● High performance, general purpose, open source, standards-based, RPC framework ● Open source version of stubby RPC in used in Google
  29. 29. gRPC ● Simple service definition ○ By default, gRPC uses protocol buffers as the Interface Definition Language (IDL) for describing both the service interface and the structure of the payload messages. ● Works across languages and platforms ○ Write golang server and python client ○ Utilize polyglot microservices
  30. 30. Why not REST? ● Who can implement REST correctly? ○ High cost to design (Path? Parameters? hah?) ○ Eventually it’s just HTTP endpoints ● No more HTTP client implementation ..
  31. 31. Challenges
  32. 32. Challenges ● Deployment ● Observability
  33. 33. Deployment ● Deployment is key in microservices platform ○ “Without velocity stalled, rather make iteration speed faster” ● We need easy & safe automated deployment system ○ We started chatbot style deployment but it was not scale
  34. 34. Spinnaker ● Continuous Delivery platform ● Developed in Netflix ○ Worked with Google and open sourced in 2015 ● Support multi cloud ○ Kubernetes!, GCE, AWS
  35. 35. Spinnaker GUI
  36. 36. Spinnaker pipeline
  37. 37. Why Spinnaker? ● Kubernetes support ● Built-in deployment best practice from Netflix and Google ○ Immutable infrastructure ○ Blue/Green deployment, Canary deployment ○ Manual judgement (by manager) phase ○ Run integration tests
  38. 38. Spinnaker in Mercari ● Currently only for container deployment to kubernetes ● Each team uses spinnaker to deploy their own services ● One spinnaker handles all microservices in all region
  39. 39. Example pipeline of API gateway deployment (Canary)
  40. 40. One spinnaker cluster manages Mercari global deployment JPUS UK
  41. 41. Future of spinnaker ● Pipeline as a Code ○ https://github.com/spinnaker/dcd-spec ● Automated canary analysis
  42. 42. Automated canary analysis https://blog.spinnaker.io/can-i-push-that-building-safer-low-risk-deployments-with-spinnaker-a27290847ac4
  43. 43. Observability Observability (logging, metrics & tracing) is important ● Each team needs to debug service by themselves without SSH ● It’s harder and more complex than monolith
  44. 44. Stackdriver logging
  45. 45. Request ID in log ● Which service caused problem in one request?
  46. 46. Request ID in log Gateway API Mercari API HTTP search personalization offer HTTP gRPC ① Generate unique ID ② Annotate log by the ID in same request HTTP headergRPC metadata
  47. 47. Request ID in log Search by request ID Log from gateway Log from service X
  48. 48. Distributed tracing ● Which services makes the request slow?
  49. 49. Stackdriver tracing
  50. 50. Metrics Selection of metrics service/software is still on-going discussion & trial ● First support of container and kubernetes ● Integration with kubernetes ecosystem ○ Spinnaker, istio and so on ● Service dependency visualization
  51. 51. Prometheus + grafana
  52. 52. Datadog
  53. 53. Instana
  54. 54. State of microservices in JP
  55. 55. State of microservices in JP JP is just started ● Some services (Machine learning product) are started to containerized and deployed on GKE ● On-going discussion about the best architecture
  56. 56. Conclusion ● Why we started microservices? ● Current state of US microservices and challenges
  57. 57. We’re hiring ● Who loves automation ● Technical keywords ○ Docker ○ Kubernetes ○ gRPC ○ Golang ○ Container monitoring
  58. 58. Spinnaker is deployed on GKE
  59. 59. Testing Testing in microservice is hard? ● Focus on unit tests as usual ○ Because each service is supposed to independent ○ Each microservices must measure testing coverage ● Integration tests? ○ Use mock instead of working hard for preparing local env
  60. 60. Testing pyramid Google Testing Blog: Just Say No to More End-to-End Tests Do this a lot ! Do mock
  61. 61. QA environment How to test development feature from QA device? ● Pull request (PR) based pod creation
  62. 62. PR based pod creation Proxy API gateway (master) API gateway (PR 313) API gateway (PR 314) Proxy by PR number Set RP number Container is deployed via CI
  63. 63. PR based docker container (QA env) Easy to switch
  64. 64. PR based pod creation Proxy API gateway (master) API gateway (PR 313) API gateway (PR 314) Service A (master) Service A (PR 21) Proxy by PR number Set RP number Container is deployed via CI
  65. 65. Future works
  66. 66. Service mesh Don’t trust each other! ● Traffic management ○ API rate limit, circuit breaker ● Policy enforcement ○ Ensure access policies (which service can access which service?) We should realize above without modifying client/server code!
  67. 67. Service mesh (Istio) https://istio.io/
  68. 68. Service mesh (Istio)
  69. 69. Chaos engineering ● Real world is hard … ○ machine is crashed, network is unstable (especially in distributed system) ● Dependent service fails anytime
  70. 70. Chaos engineering ● Service must be fault tolerance whenever something wrong ● Emulate real world problem ○ We need to identify weaknesses ■ Improper fallback settings when a service is unavailable ○ Software Engineer should be aware
  71. 71. Chaos engineering (Chaos monkey) https://github.com/Netflix/chaosmonkey

×