Continuous Integration and Delivery
for the HERE Mobile Apps
How we deliver mainline to millions of
Android and iOS users every other
week.
DevOps Meetup Berlin
Stefan Verhoeff, 28 September 2016
@stefan_verhoeff
All opinions expressed in this deck are the author's own and do not
necessarily represent the official view of HERE
Intro
• Hi, I’m Stefan
• I run a team called Wookiee
who build and run CI/CD for
Android and iOS
• We build the HERE WeGo
App, give it a try ;-)
HERE WeGo App
Agenda
• CI/CD for Mobile?
• Our approach
– Key objectives + metrics
– Pipelines
– Platform architecture
– Testing
– Dashboards
– Releasing
• Challenges & plans
• Q&A
DevOps + CI/CD
• Who here does CI/CD? On mobile?
• Wikipedia says:
– Continuous Integration
Continuous integration (CI) is the practice of merging all developer
working copies to a shared mainline several times a day
– Continuous Delivery
Continuous delivery (CD) is a software engineering approach in which
teams produce software in short cycles, ensuring that the software can
be reliably released at any time. It aims at building, testing, and
releasing software faster and more frequently.
DevOps + CI/CD
• CI is feedback to the developer
– Did I build it right?
– Did I break something?
• CD is feedback + deliver value to
the user
– Did we build the right thing?
– Did it the changes cause any
issues?
State of Mobile CI/CD
• We know about CI/CD for web
and cloud, what is different for
Mobile?
• Delivery = App Store / Play
Store -> Brick wall
• You can still keep your App
shippable at all times, even if
you don’t ship every build
Challenges of Mobile CI/CD
• Dealing with many devices
and OS versions
• App store approval cycle
takes time. Apple, looking at
you!
• High cost of failure. Can’t roll
back easily.
• CI tools for Mobile lacking in
the past, but starting to pop
up lately.
CI related tools on Mobile
• Platforms
– Google Firebase, Fabric, AWS Mobile
• App store automation
– Fastlane
• Distribution
– HockeyApp, TestFlight, Play Store channels
• Cloud devices
– Amazon Device Farm, Google Test Cloud, Bitbar TestDroid
• Cloud CI
– Travis, CircleCI, Bitrise, Build Buddy
Monolith, anyone?
Objectives + key metrics
• Maximize developer feedback
– Coverage: lines + test cases > 80% coverage
– Speed: feedback time Pre-Submit < 15 min
– Reliability: red pipeline % of day > 90%
• Maximize user delight
– App Store rating > 4.5 stars
– NPS score > 30 score
– Bug limit < 30 open bugs + SLA
– Limit # critical bugs on production < 5 per year
• Visual dashboards to track
HERE implementation of CI
Pre-submit Testing
• Improves code quality
• Enables cross-team collaboration
To make the shared “Mainline” work, we introduce 2 enablers
Code Review
• Protects the mainline
• Provides direct feedback on each change
One Mainline per
Product
• Reduces time to market
• Creates responsibility
• Creates focus
• Improves predictability
Code reviews through Gerrit
• Double combo:
– Pre-submit: CI feedback
isolated from other developers
– Review: Peer developer
feedback
• Reviews help knowledge
sharing across teams
Code Review and Pre-submit Testing
Developer‘s
machine
Download code
Code Hosting Server
1
New change
created locally
Create/modify files
2
Upload change for
review with peers
3 No
Yes
Rework
5.a
Submit5.b
Accepted
Pre-submit
4
Code
Review
Automated
Tests
HERE CI stages
Mainline
Pre-submit
Verification
Pre-submit
Verification
Pre-submit
Verification
Pre-submit
Verification
Pre-submit
Verification
Pre-submit
Verification
Pre-submit
Verification
Submit
Verification
Submit
Verification
Submit
Verification
Submit
Verification
Submit
Verification
Submit
Verification
Submit
Verification
Submit
Verification
Longer
Verification < 1h
Verification
> 1h
Can be released
Optional Manual
Tests
Longer
Verification < 1h
Longer
Verification < 1h
Pre-submit
Verification
Why we have stages
Platform architecture
Our device labs
Testing
UI test with mocks
UI test end to end
Exploratory + Regression + Compatibly
Drive in the field
Unit tests class level
Unit tests component level
1 day
2 days
100s
500+
100s
1000s
Avoiding flaky tests
• Flaky tests are a huge source of waste,
we fight them with fire
• Mocking dependencies and network
• Moving tests down to component level
• Re-run after failing
• Stable testing culture. Google’s Testing
of the toilet series
Visualizing test results – history matrix
Test status dashboard
Dashboards
• Gathering data and creating dashboards probably one
of the best things we've done to grow awareness
• Key metrics: these are in everyone's (dev + ops) annual
objectives!
– Coverage
– Performance
– Stability
• Graphite + Granafa
CI performance dashboard
CI stability dashboard
Team dashboard
Jenkins wall monitor
Keeping CI stable
• Wall monitor and team
dashboard visible in area
• Stop the line if bug limit
reached
• Build Police: rotating
developer who spots
failures and finds
who/what caused it
• HERO*: CI Engineer
available on chat for
support
* Helpful Emergency Responsible Operator
Releasing
• Release train
– Ship on a regular schedule
– Always release master*
– New features hidden behind flag
• Gradual rollout
– Activate features without App Store release!
– Roll out percentage wise to measure impact
– Works great combined with A/B testing
– Reduces risk and enables learning
* We do create a release branch but just to avoid regressions while we manually test
How we release
• Push to App Store / Play Store still manual but automation
planned
• Weekly: Alpha internal release
• Every other week
– Beta
– Production
• Release checklist
– Comms, legal, privacy, platform, ...
• Release results tracking
Release schedule
Production feedback
• Metrics: analytics, logs and crash data
• User feedback, NPS, store reviews
CI as Code
• Jenkins Job DSL plugin
• Treat it as code:
– Version control
– Abstractions, re-use and
refactoring
– Tests
– CI for CI code (infinite
recursion error)
– Code reviews
– Local testing
[
Constants.Branches.MASTER,
Constants.Branches.RELEASE
].each { branch ->
def pipelineBuilder = new NightlyPipelineBuilder(this)
pipelineBuilder.with {
defaultConfiguration([
client : Constants.Description.Contacts.WOOKIEE,
pipelineType: Constants.Description.PipelineType.NIGHTLY
])
baseFolders([Constants.Folders.PROJECT, branch, Constants.Folders.NIGHTLY])
triggerManifestRepoBranch = branch == Constants.Branches.MASTER ? 'master' :
'release/heresuite'
triggerManifestRepoFile = 'heresuite-android.xml'
triggerPublishArtifacts = ['manifest.xml']
defaultBuildJobName =
"${Constants.Folders.PROJECT}/${branch}/${Constants.Folders.NIGHTLY}/" +
'build_heresuite-android_arm_dev_debug_play'
addTrigger()
buildPhase.predefinedParams += [BRANCH: branch]
testPhase.predefinedParams += [BRANCH: branch]
addBuildJobs()
addTestEndurance()
addTestEndurance10()
addTestOfflineWithRerun()
addTestUnitTestCoverage()
}
Challenges & solutions
• Long release cycles for App -> Move to release train + gradual rollout
• Long SDK integration cycle -> Early manual testing and connected CI systems
• Constant bug fighting App teams -> Tech improvements, refactoring, test automation, RCA
• Flaky tests -> Visualize, prune test suite, mock, component tests
• No insight into state of CI KPIs -> Collect metrics and create dashboards
• Manual job maintenance hell -> CI as Code
• Taking action on failures -> Visible screens, HERO and build police roles
• Maintenance and capacity devices -> Buy more hardware, optimize use, move to cloud
Next plans / on the radar
• More tests for CI code
• Jenkins Pipeline plugin
• Fastlane for iOS
• App Store / Play Store
automation
• TestDroid cloud testing
• Multiplatform end to end tests
Thank you!
Bonus
Pipelines for Mobile
• The stages of our CI system
– Pre-Submit
– Post-Submit
– Hourly
– Nightly
Mobile specific concerns
• Signing certificates
• Distributing Dev and Beta builds
– HockeyApp
– TestFlight App Store
– Google Play Store Alpha/Beta channel
• App Store / Play Store automation
CI node dashboard
Device lab dashboard
CI building blocks
• Builds
• Static Analysis
• Unit tests
• Functional test
Test types and life cycle
• Gateway
– Set of representative test run in Pre-Submit
– Very stable and fast
• Hourly
– Big set up regression tests. Take an hour to run so timed
• Unproven
– New tests first need to prove that they are stable enough before being added to the
hourly test set
• Unstable
– Known unstable tests. Either fix or delete
• End to end
– All above tests use mocks. But there is a set of . These are more brittle so high
maintenance costs. They do find more issues.
Non-functional tests
• Performance KPI
• Monkey
• Power consumption
• Memory

CI/CD for mobile at HERE

  • 1.
    Continuous Integration andDelivery for the HERE Mobile Apps How we deliver mainline to millions of Android and iOS users every other week. DevOps Meetup Berlin Stefan Verhoeff, 28 September 2016 @stefan_verhoeff All opinions expressed in this deck are the author's own and do not necessarily represent the official view of HERE
  • 2.
    Intro • Hi, I’mStefan • I run a team called Wookiee who build and run CI/CD for Android and iOS • We build the HERE WeGo App, give it a try ;-)
  • 3.
  • 4.
    Agenda • CI/CD forMobile? • Our approach – Key objectives + metrics – Pipelines – Platform architecture – Testing – Dashboards – Releasing • Challenges & plans • Q&A
  • 5.
    DevOps + CI/CD •Who here does CI/CD? On mobile? • Wikipedia says: – Continuous Integration Continuous integration (CI) is the practice of merging all developer working copies to a shared mainline several times a day – Continuous Delivery Continuous delivery (CD) is a software engineering approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time. It aims at building, testing, and releasing software faster and more frequently.
  • 6.
    DevOps + CI/CD •CI is feedback to the developer – Did I build it right? – Did I break something? • CD is feedback + deliver value to the user – Did we build the right thing? – Did it the changes cause any issues?
  • 7.
    State of MobileCI/CD • We know about CI/CD for web and cloud, what is different for Mobile? • Delivery = App Store / Play Store -> Brick wall • You can still keep your App shippable at all times, even if you don’t ship every build
  • 8.
    Challenges of MobileCI/CD • Dealing with many devices and OS versions • App store approval cycle takes time. Apple, looking at you! • High cost of failure. Can’t roll back easily. • CI tools for Mobile lacking in the past, but starting to pop up lately.
  • 9.
    CI related toolson Mobile • Platforms – Google Firebase, Fabric, AWS Mobile • App store automation – Fastlane • Distribution – HockeyApp, TestFlight, Play Store channels • Cloud devices – Amazon Device Farm, Google Test Cloud, Bitbar TestDroid • Cloud CI – Travis, CircleCI, Bitrise, Build Buddy
  • 10.
  • 11.
    Objectives + keymetrics • Maximize developer feedback – Coverage: lines + test cases > 80% coverage – Speed: feedback time Pre-Submit < 15 min – Reliability: red pipeline % of day > 90% • Maximize user delight – App Store rating > 4.5 stars – NPS score > 30 score – Bug limit < 30 open bugs + SLA – Limit # critical bugs on production < 5 per year • Visual dashboards to track
  • 12.
    HERE implementation ofCI Pre-submit Testing • Improves code quality • Enables cross-team collaboration To make the shared “Mainline” work, we introduce 2 enablers Code Review • Protects the mainline • Provides direct feedback on each change One Mainline per Product • Reduces time to market • Creates responsibility • Creates focus • Improves predictability
  • 13.
    Code reviews throughGerrit • Double combo: – Pre-submit: CI feedback isolated from other developers – Review: Peer developer feedback • Reviews help knowledge sharing across teams
  • 14.
    Code Review andPre-submit Testing Developer‘s machine Download code Code Hosting Server 1 New change created locally Create/modify files 2 Upload change for review with peers 3 No Yes Rework 5.a Submit5.b Accepted Pre-submit 4 Code Review Automated Tests
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
    Testing UI test withmocks UI test end to end Exploratory + Regression + Compatibly Drive in the field Unit tests class level Unit tests component level 1 day 2 days 100s 500+ 100s 1000s
  • 20.
    Avoiding flaky tests •Flaky tests are a huge source of waste, we fight them with fire • Mocking dependencies and network • Moving tests down to component level • Re-run after failing • Stable testing culture. Google’s Testing of the toilet series
  • 21.
    Visualizing test results– history matrix
  • 22.
  • 23.
    Dashboards • Gathering dataand creating dashboards probably one of the best things we've done to grow awareness • Key metrics: these are in everyone's (dev + ops) annual objectives! – Coverage – Performance – Stability • Graphite + Granafa
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    Keeping CI stable •Wall monitor and team dashboard visible in area • Stop the line if bug limit reached • Build Police: rotating developer who spots failures and finds who/what caused it • HERO*: CI Engineer available on chat for support * Helpful Emergency Responsible Operator
  • 29.
    Releasing • Release train –Ship on a regular schedule – Always release master* – New features hidden behind flag • Gradual rollout – Activate features without App Store release! – Roll out percentage wise to measure impact – Works great combined with A/B testing – Reduces risk and enables learning * We do create a release branch but just to avoid regressions while we manually test
  • 30.
    How we release •Push to App Store / Play Store still manual but automation planned • Weekly: Alpha internal release • Every other week – Beta – Production • Release checklist – Comms, legal, privacy, platform, ... • Release results tracking
  • 31.
  • 32.
    Production feedback • Metrics:analytics, logs and crash data • User feedback, NPS, store reviews
  • 33.
    CI as Code •Jenkins Job DSL plugin • Treat it as code: – Version control – Abstractions, re-use and refactoring – Tests – CI for CI code (infinite recursion error) – Code reviews – Local testing [ Constants.Branches.MASTER, Constants.Branches.RELEASE ].each { branch -> def pipelineBuilder = new NightlyPipelineBuilder(this) pipelineBuilder.with { defaultConfiguration([ client : Constants.Description.Contacts.WOOKIEE, pipelineType: Constants.Description.PipelineType.NIGHTLY ]) baseFolders([Constants.Folders.PROJECT, branch, Constants.Folders.NIGHTLY]) triggerManifestRepoBranch = branch == Constants.Branches.MASTER ? 'master' : 'release/heresuite' triggerManifestRepoFile = 'heresuite-android.xml' triggerPublishArtifacts = ['manifest.xml'] defaultBuildJobName = "${Constants.Folders.PROJECT}/${branch}/${Constants.Folders.NIGHTLY}/" + 'build_heresuite-android_arm_dev_debug_play' addTrigger() buildPhase.predefinedParams += [BRANCH: branch] testPhase.predefinedParams += [BRANCH: branch] addBuildJobs() addTestEndurance() addTestEndurance10() addTestOfflineWithRerun() addTestUnitTestCoverage() }
  • 34.
    Challenges & solutions •Long release cycles for App -> Move to release train + gradual rollout • Long SDK integration cycle -> Early manual testing and connected CI systems • Constant bug fighting App teams -> Tech improvements, refactoring, test automation, RCA • Flaky tests -> Visualize, prune test suite, mock, component tests • No insight into state of CI KPIs -> Collect metrics and create dashboards • Manual job maintenance hell -> CI as Code • Taking action on failures -> Visible screens, HERO and build police roles • Maintenance and capacity devices -> Buy more hardware, optimize use, move to cloud
  • 35.
    Next plans /on the radar • More tests for CI code • Jenkins Pipeline plugin • Fastlane for iOS • App Store / Play Store automation • TestDroid cloud testing • Multiplatform end to end tests
  • 36.
  • 37.
  • 38.
    Pipelines for Mobile •The stages of our CI system – Pre-Submit – Post-Submit – Hourly – Nightly
  • 39.
    Mobile specific concerns •Signing certificates • Distributing Dev and Beta builds – HockeyApp – TestFlight App Store – Google Play Store Alpha/Beta channel • App Store / Play Store automation
  • 40.
  • 41.
  • 42.
    CI building blocks •Builds • Static Analysis • Unit tests • Functional test
  • 43.
    Test types andlife cycle • Gateway – Set of representative test run in Pre-Submit – Very stable and fast • Hourly – Big set up regression tests. Take an hour to run so timed • Unproven – New tests first need to prove that they are stable enough before being added to the hourly test set • Unstable – Known unstable tests. Either fix or delete • End to end – All above tests use mocks. But there is a set of . These are more brittle so high maintenance costs. They do find more issues.
  • 44.
    Non-functional tests • PerformanceKPI • Monkey • Power consumption • Memory

Editor's Notes

  • #2 This talk will describe how we use a scaled approach for CI/CD. The system is set up for iOS and Android Apps but many of the concepts presented are applicable for any type of application. We will cover the different pipeline stages a change goes through, how we automate many levels of testing, treat our CI infrastructure as code, which key metrics we use and we track them on dashboards. All this demonstrates how we can get close to Continuous Delivery for platforms still ruled by App stores.
  • #6 What is CI/CD? DevOps, the wall between Dev and Ops DevOps three ways http://itrevolution.com/a-personal-reinterpretation-of-the-three-ways/ Industry / Wikipedia definition HERE implementation CI is feedback, did I build it right? Did I break something? CD is feedback + deliver value. Did I build the right thing (A/B test + store + NPS + analytics). Did it break (crashes)?
  • #7 What is CI/CD? DevOps, the wall between Dev and Ops DevOps three ways http://itrevolution.com/a-personal-reinterpretation-of-the-three-ways/ Industry / Wikipedia definition HERE implementation CI is feedback, did I build it right? Did I break something? CD is feedback + deliver value. Did I build the right thing (A/B test + store + NPS + analytics). Did it break (crashes)?
  • #8 http://www.dangerouscreation.com/wp-content/uploads/2011/08/crash1.jpg
  • #9 Challenges to do CI/CD Mobile: App store approval cycle (Apple, looking at you! Came down from 7 days to 1 day) Device compatibility, lot's of test cases (Android...) Dealing with devices and hardware Android Phones Apple Mac mini Devices are flaky, battery exploding State of Puppet on Mac is not optimal. Xcode setup, try to automate that. High cost of failure / longer recovery MTTR Special case: driver safety CI tools for Mobile lacking Read about CI/CD tools: it's all about cloud + web Tools: TestDroid, Apptimize, Fastlane, HockeyApp
  • #11 We have microservices now on the cloud. What do we have on Mobile?
  • #12 Goal = Feedback: Quantity / coverage Speed Reliability Metrics Coverage: lines + test cases Speed: feedback time Pre-Submit Reliability: red pipeline % of day Grafana dashboards
  • #18 Gerrit Jenkins Plugins: Matrix, MultiJob, Wall display, THX, Xcode Job DSL Artifactory AWS Docker Repo TestDroid, Apptimize, HockeyApp
  • #19 Which is cool, but feels a bit like being back in the 2000s Phones not build to be charger 24/7/ Batteries bloat and they fail.
  • #20 Exploratory tests / drive testing. Car driving safety is critical! Frameworks: Espresso, Calabash Avoid flaky tests, mocking, re-run. FAST principles Test types and life cycle: Gateway Hourly Unproven Unstable End to end Visualize: Test history matrix Graphite dashboards Wall display Non-functional: Performance KPI Monkey Power consumption Memory
  • #21 https://testing.googleblog.com/search/label/TotT
  • #22 https://sourceforge.net/projects/junitth/
  • #23 https://www.hostedgraphite.com/d79d37bc/grafana/dashboard/db/heresuite-android-teams-espressotests
  • #24 Objectives: https://confluence.in.here.com/display/~gmontgom/2016+Objective+Progress+Tracking Gathering data and creating dashboard probably one of the best things we've done Key metrics: these are in everyone's (dev + ops) annual objectives! Performance Stability Coverage Performance Stability Team dashboard Wall display Test state Slave label statistics Device lab
  • #25 https://www.hostedgraphite.com/d79d37bc/grafana/dashboard/db/consumer-cci-vs-ams-pipeline-duration
  • #26 https://www.hostedgraphite.com/d79d37bc/grafana/dashboard/db/marshall-kpi-jenkins-stability-working-hours-dsl
  • #27 https://www.hostedgraphite.com/d79d37bc/grafana/dashboard/db/heresuite-android-teams-cci
  • #29 Team organization and process DevOps team vs. No DevOps team Multi-mission team setup and LaunchPad Ops Kanban for DevOps team HERO role for team support + allow focus ChatOps: hotline for CI support Build police to follow up on broken builds/tests
  • #30 Release train: Spotify Facebook SAFe framework
  • #31 Alpha: weekly Alternate weekly: beta + production Always ship master. But release branch for test freeze Feature flags + A/B testing to roll out safely, reduce risk and learn Metrics: analytics, logs and crash data Metrics: release success stats User feedback, NPS, store reviews Tracking releases and A/B tests on wiki Release results: https://hmcs-my.sharepoint.com/personal/frank_janisch_here_com/_layouts/15/WopiFrame.aspx?guestaccesstoken=rCkVbKnu0aSEad%2bWN5AFolSmyCKOZFDCSF3jigqxErM%3d&docid=0c9e5b54c1f4644fcaf0ebfdb96f600b4&action=view Release checklist: https://confluence.in.here.com/display/ConsumerExperience/Android+Release+Checklist
  • #32 https://confluence.in.here.com/display/ConsumerExperience/05+Releases
  • #34 https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin
  • #35 Challenges faced Long release cycles for App Target: bi-weekly delivery train Start monthly, move closer to target Master only Feature flags Increase test automation and focus on stability Start building trust in the automation Inspect and adapt testing cycle, some weeks to 3 days Long SDK integration cycle, takes weeks of work Only few major releases per year Always many new issues discovered during integration Long wait time to get features/fixes in, fight for priority with other business units Solution ongoing: Weekly exploratory testing of SDK master Pipelines that trigger App CI for feedback Next: Process to get feedback flowing, build police SDK and MOS specific test cases Start contributing code to SDK and MOS Manual job maintenance nightmare Same change, many places No re-use Solution: CI as code, Job DSL All build/test code into Gerrit Constant bug fighting App teams Project with focus on technical improvements. Includes dev and CI engineers KPIs: feedback coverage, performance and stability Build dashboards Flaky tests THX Unstable/Unproven tests grouping Inspect and remove many tests with low value Re-run failed tests Not enough insight into performance Start collecting metrics from Jenkins Store in Graphite, dashboard with Grafana Taking action on failures Build police Capacity issues build/device nodes Pre-Submit inspect commit and run relevant jobs only Monitor queue Automate recovery Buy newer hardware
  • #36 Jenkins Pipeline More tests for CI code Fastlane App Store / Play Store automation TestDroid Multiplatform end to end tests
  • #39 Pipelines, CI stages Pre-Submit, Post-Submit, Hourly, Nightly KEY LEARNING: Pre-Submit testing changes everything! Avoids bothering other people with your failing code before merging to master.
  • #42 https://www.hostedgraphite.com/d79d37bc/grafana/dashboard/db/android-device-dashboard
  • #43 Builds, flavors Release / Debug Android: Gradle Static code analysis Fast and reliable Good to have, but not enough Android: Codenarc FindBugs CheckStyle Commit hook custom rules (commit message) iOS: TBD