ACTION!
DEVELOPMENT AND
OPERATIONS FOR STICKER SHOP
Haruki Sato, LINE
@singing_hacky
https://github.com/haruki-sugarsun
自己紹介: HARUKI SATO
 Software Engineer working on LINE ”Shop” products i.e. Sticker/Theme/etc.
 Leading LINE Shop development team
• Source Code Management
• Weekly Releases
• SRE-like approaches with
• Monitoring
• Load Balancing
• Outage handling & Postmortem
Agenda
自己紹介: THE TEAM
 In this session, we are focusing on the server-side development.
自己紹介: THE TEAM
Fukuoka
~7 Engs
Tokyo
~7 Engs
+Client
+Web FE
+QA
...
STORE
Sticker
Theme
…& more
自己紹介: THE PRODUCTS
Java
JavaScript
Python
…
5 years
MySQL
MongoDB
Redis
...
自己紹介: THE PRODUCTS
 This is NOT the standard development flow in LINE.
 Just a case study by LINE Shop team
 Each team in LINE has own way depending on their demand
 We are NOT doing everything perfect.
DISCLAIMER
SOURCE CODE MANAGEMENT
 GitHub Enterprise
 Single repository
 Even for multiple products
 We are using gradle multi-project.
 “github flow”-like branch management
 `master` is the only main branch
 Each member opens a Pull Requests to the `master`.
 We create a `release-*` branch for each release.
 Will discussed later.
SOURCE CODE MANAGEMENT
SINGLE REPOSITORY SKETCH
 Able to share definitions
 API
 Configurations
 Able to reduce code duplication
 Define common libraries
 Easier version management
 We only need 1 version == git SHA1 hash (at least in Shop world :P)
 Easy to minimize the version glitch among micro-services
 Simple release process
WHY SINGLE REPOSITORY?
BRANCH MANAGEMENT SKETCH
WEEKLY RELEASE
 Create a `release-*` branch on every Thursday.
 Deploy the `release-*` branch to STAGING environment.
 Work with QA to run a “regression test” in BETA & STAGING environment.
 To verify the features are working well, without any regression.
 If we find a problem, we fix it in `master` branch, and cherry-pick to `release-*`.
 After QA sign-off, we deploy the `release-*` branch to REAL environment on every
Wednesday.
WEEKLY RELEASE
 (`master` branch is always automatically deployed to BETA)
 Jenkins+”gradle plugin” (for LINE-internal deployment system)
WEEKLY RELEASE SKETCH
 Able to minimize the pending changes
 Your code change usually goes live within a week.
 (I cannot remember what I wrote a month ago… :P)
 We need to use ”flags” to control the features.
 Easy to minimize the version glitch among micro-services
 Ideally, the version diffs are smaller than 1 week changes.
 Even if we have no change in our server program, we want to upgrade the
dependencies as much as possible.
WHY WEEKLY RELEASE?
SRE-LIKE APPROACHES
 Shop uses “Zipkin”, “Micrometer” integrated in Armeria
 https://zipkin.io/ for API tracing
 https://micrometer.io/ for server-side metrics
 Collects those data into Elasticsearch, Prometheus, and IMON
 IMON is an internal project for logging/metrics collection.
 When we do “load-test”, server-side metrics are also very useful.
MONITORING
 We use both
 Standard (L4) load-balancer
 with L3DSR etc.
 Client-side load-balancer integrated in Armeria
 https://line.github.io/armeria/apidocs/index.html?com/linecorp/armeria/cli
ent/endpoint/package-summary.html
 These techniques support multiple backends, health-check and failover
 So that we can do service release without down-time.
 e.g. rolling-restart, Blue/Green deployment.
 (actually we have not yet tried Blue/Green)
LOAD BALANCING
 We assign 2 members for “On-Call” rotation every week to do
 First-aid actions for any production issue
 In the worst case, we just do restart, “maintenance mode”, or shutdown
the service…
 Filing tickets to track the issue, so that
 Prepare an outage report to collect
 Cause
 Resolution
 Action Items
OUTAGE HANDLING & POSTMORTEM
OUTAGE REPORT EXAMPLE
 According to report analysis by our SET (Software Engineer in Test) member,
early 2017, Shop services had outages biweekly !??!! orz
 Good point is we can visualize such issues and know the current situation by
numbers.
(OUTAGE SECRET)
 “Site Reliability Engineering”: http://landing.google.com/sre/book.html
 Armeria: https://github.com/line/armeria
FURTHER READING
THANK YOU
THANK YOU!
THANK YOU!!

Action! Development and Operations for Sticker Shop

  • 1.
    ACTION! DEVELOPMENT AND OPERATIONS FORSTICKER SHOP Haruki Sato, LINE
  • 2.
    @singing_hacky https://github.com/haruki-sugarsun 自己紹介: HARUKI SATO Software Engineer working on LINE ”Shop” products i.e. Sticker/Theme/etc.  Leading LINE Shop development team
  • 3.
    • Source CodeManagement • Weekly Releases • SRE-like approaches with • Monitoring • Load Balancing • Outage handling & Postmortem Agenda
  • 4.
    自己紹介: THE TEAM In this session, we are focusing on the server-side development.
  • 5.
    自己紹介: THE TEAM Fukuoka ~7Engs Tokyo ~7 Engs +Client +Web FE +QA ...
  • 6.
  • 7.
  • 8.
     This isNOT the standard development flow in LINE.  Just a case study by LINE Shop team  Each team in LINE has own way depending on their demand  We are NOT doing everything perfect. DISCLAIMER
  • 9.
  • 10.
     GitHub Enterprise Single repository  Even for multiple products  We are using gradle multi-project.  “github flow”-like branch management  `master` is the only main branch  Each member opens a Pull Requests to the `master`.  We create a `release-*` branch for each release.  Will discussed later. SOURCE CODE MANAGEMENT
  • 11.
  • 12.
     Able toshare definitions  API  Configurations  Able to reduce code duplication  Define common libraries  Easier version management  We only need 1 version == git SHA1 hash (at least in Shop world :P)  Easy to minimize the version glitch among micro-services  Simple release process WHY SINGLE REPOSITORY?
  • 13.
  • 14.
  • 15.
     Create a`release-*` branch on every Thursday.  Deploy the `release-*` branch to STAGING environment.  Work with QA to run a “regression test” in BETA & STAGING environment.  To verify the features are working well, without any regression.  If we find a problem, we fix it in `master` branch, and cherry-pick to `release-*`.  After QA sign-off, we deploy the `release-*` branch to REAL environment on every Wednesday. WEEKLY RELEASE  (`master` branch is always automatically deployed to BETA)  Jenkins+”gradle plugin” (for LINE-internal deployment system)
  • 16.
  • 17.
     Able tominimize the pending changes  Your code change usually goes live within a week.  (I cannot remember what I wrote a month ago… :P)  We need to use ”flags” to control the features.  Easy to minimize the version glitch among micro-services  Ideally, the version diffs are smaller than 1 week changes.  Even if we have no change in our server program, we want to upgrade the dependencies as much as possible. WHY WEEKLY RELEASE?
  • 18.
  • 19.
     Shop uses“Zipkin”, “Micrometer” integrated in Armeria  https://zipkin.io/ for API tracing  https://micrometer.io/ for server-side metrics  Collects those data into Elasticsearch, Prometheus, and IMON  IMON is an internal project for logging/metrics collection.  When we do “load-test”, server-side metrics are also very useful. MONITORING
  • 20.
     We useboth  Standard (L4) load-balancer  with L3DSR etc.  Client-side load-balancer integrated in Armeria  https://line.github.io/armeria/apidocs/index.html?com/linecorp/armeria/cli ent/endpoint/package-summary.html  These techniques support multiple backends, health-check and failover  So that we can do service release without down-time.  e.g. rolling-restart, Blue/Green deployment.  (actually we have not yet tried Blue/Green) LOAD BALANCING
  • 21.
     We assign2 members for “On-Call” rotation every week to do  First-aid actions for any production issue  In the worst case, we just do restart, “maintenance mode”, or shutdown the service…  Filing tickets to track the issue, so that  Prepare an outage report to collect  Cause  Resolution  Action Items OUTAGE HANDLING & POSTMORTEM
  • 22.
  • 23.
     According toreport analysis by our SET (Software Engineer in Test) member, early 2017, Shop services had outages biweekly !??!! orz  Good point is we can visualize such issues and know the current situation by numbers. (OUTAGE SECRET)
  • 24.
     “Site ReliabilityEngineering”: http://landing.google.com/sre/book.html  Armeria: https://github.com/line/armeria FURTHER READING
  • 25.
  • 26.
  • 27.