SlideShare a Scribd company logo
1 of 24
Download to read offline
Verify Your Kubernetes Clusters
with Upstream e2e Tests
Kenichi Omichi - NEC
Open Source Summit NA 2018
8/29/2018
Agenda
About me
Upstream e2e tests
Motivation to use e2e tests for production
How to use e2e tests
Test failures, solutions and workarounds
- PV Protection tests
- StatefulSet tests
- HPA tests
- MetricsGrabber tests
- Resource usage tracking tests
Summary
3 © NEC Corporation 2018
About me
▌Kenichi Omichi from NEC
▌Software engineer
▌Open source contributor
▌OpenStack
ïŹOpenStack/QA core developer, ex-PTL (Project Team Lead)
ïŹOpenStack/Nova core developer
▌Kubernetes
ïŹEstablished community member
ïŹCKA (Certified Kubernetes Administrator)
4 © NEC Corporation 2018
Upstream e2e tests
▌Test end-to-end (e2e) behavior of system level
ïŹUnit tests cannot catch unforeseen changes at system level sometimes
▌CI for pull requests
ïŹPull requests need to pass subset of e2e tests before merging
▌e2e tests continue increasing
ïŹThe latest k8s(for v1.12) contains
1033 tests
600
700
800
900
1000
1100
v1.7 v1.8 v1.9 v1.10 v1.11 master
e2e specs
5 © NEC Corporation 2018
Motivation to use e2e tests for production
▌k8s contains a lot of features and a lot of corresponding e2e tests
ïŹE2e test number is still increasing
▌e2e tests are useful for verifying k8s clusters
ïŹWe can find misconfigurations before productions
ïŹRarely you face test failures
▌Investigations of test failures are good chances to learn k8s deeply
ïŹWe can understand overview/detail of k8s features and test expectations
6 © NEC Corporation 2018
How to use e2e tests – Build e2e test binary
▌Check version of target k8s cluster
▌Download k8s source code and checkout same version
▌Build e2e test binary
$ go get k8s.io/kubernetes
$ cd $GOPATH/src/k8s.io/kubernetes
$ git checkout refs/tags/v1.11.1
$ kubectl version


Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1",
$ sudo /usr/local/go/bin/go run hack/e2e.go -- --build
7 © NEC Corporation 2018
How to use e2e tests – Run all e2e tests
▌Choose provider type for your cloud platform:
ïŹskeleton, local, gce, gke, aws
▌(skeleton) Export KUBE_MASTER_IP and KUBE_MASTER
▌Run all e2e tests
$ export KUBE_MASTER_IP="192.168.1.108:6443"
$ export KUBE_MASTER=k8s-master
$ export KUBECONFIG=$HOME/admin.conf
$ go run hack/e2e.go -- --provider=skeleton --test
8 © NEC Corporation 2018
How to use e2e tests – Run Conformance tests
▌What are conformance tests
ïŹA subset of e2e tests (167/999 in v1.11)
ïŹCorresponding tests for interoperable features sig-architecture has approved
ïŹk8s community expects to pass on any k8s clusters
ïŹTo get certification of Certified Kubernetes program,
every vendor needs to pass conformance tests
▌Run conformance tests
$ export KUBECONFIG=$HOME/admin.conf
$ export KUBERNETES_CONFORMANCE_TEST=true
$ go run hack/e2e.go -- --provider=skeleton --test
--test_args="--ginkgo.focus=[Conformance]"
9 © NEC Corporation 2018
We would face many failures
▌Conformance tests can be operated easily
ïŹJust 167 tests of 999 all e2e tests
ïŹThese tests have been selected by expecting to be passed on any clusters
ïŹIt takes 1.5 hours only
▌But a little difficult to run ALL e2e tests sucessully
ïŹIt could take 12 hours if many failures
ïŹMuch timeout without any configuration, retry-number and timeout are hardcord-
ed in test code
There is a lot of points we need to take care
to use e2e tests for productions
Test failures, Solutions and Workarounds
11 © NEC Corporation 2018
Test environment
▌HW
ïŹDell desktop PCs * 2: Core i5-6500 CPU @3.2GHz, 16GB MEM
ïŹIntel NUC PCs *2: Core i3-7100 @2.4GHz, 32GB MEM
▌IaaS: OpenStack Queens
ïŹOS: Ubuntu 16.04 LTS
ïŹUbuntu package
▌Kubernetes v1.11.1
ïŹ2 VMs(master, node) on top of IaaS
ïŹOS: Ubuntu 16.04 LTS
ïŹInstall tool: kubeadm
ïŹNetwork: Flannel
CTRLCPU01
CPU02CPU03
OpenStack IaaS
master node
12 © NEC Corporation 2018
Failure case 1: PV Protection tests
▌2 PV Protection tests were failed
▌Root Cause
ïŹTests were failed to check PV Protection flag in PV
due to nonexistent flag
ïŹThe flag is set after the PV phase becomes “Available”
from “Pending”, but tests didn’t check the phase
ïŹe2e test machine was faster than k8s cluster,
so the flag was nonexistent
▌Solution
ïŹAdd waiting for “Avaiable” phase before checking the flag in tests
ïŹPR: https://github.com/kubernetes/kubernetes/pull/67520 (Merged 8/20/2018)
Verify "immediate" deletion of a PV that is not bound to a PVC
Verify that PV bound to a PVC is not removed immediately
Persistent Volume
phase:
Pending -> Available
finalizers:
kubernetes.io/pv-
protection
13 © NEC Corporation 2018
You might want to backport the PR to e2e
▌But e2e doesn’t start if changing e2e code due to version difference
▌Specify --check-version-skew=false to skip version check
$ go run hack/e2e.go -- --provider=skeleton --test
error: server version (&version.Info{Major:"1", Minor:"11", ..) differs from
client version (version.Info{Major:"1", Minor:"11+", ..)
exit status 1
$ go run hack/e2e.go -- --provider=skeleton –test --check-version-skew=false
Client Version: version.Info{Major:"1", Minor:"11+", ...
Server Version: version.Info{Major:"1", Minor:"11", ...
...
Running Suite: Kubernetes e2e suite
14 © NEC Corporation 2018
Failure case 2: StatefulSet tests
▌3 StatefulSet tests were failed
▌Root Cause
ïŹSince v1.11 k/k in-tree openstack-cloud-provider is deprecated, and the external
openstack-cloud-provider is recommended
ïŹDocument issue: Tried using the external one for PersistentVolume, but
misconfigured StorageClass because StorageClass example was written for the
internal one
▌Solution
ïŹSpecify “openstack.org/standalone-cinder” instead of “kubernetes.io/cinder”
as the provisioner in StorageClass
ïŹPR: https://github.com/kubernetes/cloud-provider-openstack/pull/261 (Merged)
should adopt matching orphans and release non-matching pods
should not deadlock when a pod's predecessor fails
should provide basic identity
15 © NEC Corporation 2018
Failure case 3: HPA tests - 1
▌8 HPA(Horizontal Pod Autoscaling) tests were failed
▌How does HPA work
[1] ReplicationController light [It] Should scale from 1 pod to 2 pods
[2] ReplicationController light [It] Should scale from 2 pods to 1 pod
[3] Should scale from 1 pod to 3 pods and from 3 to 5
[4] Should scale from 5 pods to 3 pods and from 3 to 1
[5] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5
[6] ReplicaSet [It] Should scale from 5 pods to 3 pods and from 3 to 1
[7] ReplicationController [It] Should scale from 1 pod to 3 pods and from 3 to 5 and verify decision stability
[8] ReplicationController [It] Should scale from 5 pods to 3 pods and from 3 to 1 and verify decision stability
HPA
RC/RS/Deployment
scale
metrics.k8s.io API
pod pod pod

fetchupdate
16 © NEC Corporation 2018
Failure case 3: HPA tests - 2
▌Root Cause
ïŹHPA feature requires metrics-server to fetch CPU usage since k8s v1.9, but I didn’t
deploy it
▌Solution
ïŹDeploy metrics-server
HPA
metrics.k8s.io API
fetch
metrics-server
provide
$ git clone https://github.com/kubernetes-incubator/metrics-server
$ cd metrics-server/
$ kubectl create -f deploy/1.8+/
kubelet kubelet kubelet
fetch
17 © NEC Corporation 2018
Failure case 3: HPA tests - 3
▌Still 6 HPA(Horizontal Pod Autoscaling) tests were failed
▌What does test [3] expect the result in autoscaling 1 to 3 pods
[1] ReplicationController light [It] Should scale from 1 pod to 2 pods
[2] ReplicationController light [It] Should scale from 2 pods to 1 pod
[3] Should scale from 1 pod to 3 pods and from 3 to 5
[4] Should scale from 5 pods to 3 pods and from 3 to 1
[5] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5
[6] ReplicaSet [It] Should scale from 5 pods to 3 pods and from 3 to 1
[7] ReplicationController [It] Should scale from 1 pod to 3 pods and from 3 to 5 and verify decision stability
[8] ReplicationController [It] Should scale from 5 pods to 3 pods and from 3 to 1 and verify decision stability
HPA
RC/RS/Deployment
scale
pod
Solved
1. Set target cpu: 20%
2. Make 50% CPU workload
3. Scale to 3 pods:
50% / 20% = 2.5
pod pod
18 © NEC Corporation 2018
Failure case 3: HPA tests - 4
▌Root Cause
ïŹActual CPU workload (100%) was bigger than expected (50%)
ïŹThen HPA scaled pods to 5 instead of expected 3:
100% / 20% = 5
ïŹCPU workload is generated by many resource consumer processes.
Each process tried to make 2% CPU workload and 25 processes existed,
but it is hard to make such small workload and actual workload was 4%:
Test expectation: 50% = 2% * 25 processes
Actual workload: 100% = 4% * 25 processes
▌Solution
ïŹChange the CPU workload of each process to 10% for the test stability
50% = 10% * 2 processes + 5 * processes
ïŹPR: https://github.com/kubernetes/kubernetes/pull/67053 (Merged 8/7/2018)
ïŹAll HPA tests have passed successfully
19 © NEC Corporation 2018
Failure case 4: MetricsGrabber test
▌1 MetricsGrabber test was failed
ïŹk8s API call in the test was failed
▌Root Cause
ïŹkube-scheduler listens 10251/tcp locally(127.0.0.1) and it denies connections to its
node ip-address.
▌Workaround
ïŹChange listening ip-address to the host address in /etc/kubernetes/manifests/kube-
scheduler.yaml
ïŹIssue: https://github.com/kubernetes/kubernetes/issues/67685
should grab all metrics from a Scheduler.
GET /v1/namespaces/kube-system/pods/
kube-scheduler-k8s-master:10251/proxy/metrics
We can get to know what APIs are called in tests with “-v=8”:
$ go run hack/e2e.go -- --test --test_args="-v=8"
20 © NEC Corporation 2018
Failure case 5: Resource usage tracking tests - 1
▌2 Resource usage tracking tests were failed
ïŹTests expect CPU usage of kubelet is under 10%, but it was over 12%
▌Root Cause
ïŹk8s VMs were running on NUC PCs
ïŹCPU speed is not enough
‱ NUC PCs: i3-7100 @2.4GHz
‱ Desktop PCs: i5-6500 CPU @3.2GHz
resource tracking for 0 pods per node
resource tracking for 100 pods per node
CTRLCPU01
CPU02CPU03
master node
21 © NEC Corporation 2018
Failure case 5: Resource usage tracking tests - 2
▌Solution
ïŹMigrate VMs to another host which has better CPUs
ïŹQuestion: Isn’t such test environment supported with e2e test?
ïŹIssue: https://github.com/kubernetes/kubernetes/issues/67621
CTRLCPU01
CPU02CPU03
master
node
22 © NEC Corporation 2018
Summary
▌Comformance tests are useful for verifying basic functions
▌All e2e tests are useful for verifying extended functions (HPA,
StatefulSets, etc.)
ïŹWe can find misconfigurations before productions
ïŹRarely you face test failures
▌Investigations of test failures are good chance to learn k8s deeply
ïŹWe can understand overview/detail of k8s features and test expectations
▌Great to bring test issues up to community
ïŹFew tests can run on some environments (upstream CI)
ïŹDiscuss test results which are gotten from different environments, we can make tests more stable
ïŹThat is great contribution to community by making development smooth
23 © NEC Corporation 2018
References
▌e2e tests
ïŹhttps://github.com/kubernetes/community/blob/master/contributors/devel/e2e-
tests.md
▌Conformance tests
ïŹhttps://github.com/kubernetes/community/blob/master/contributors/devel/confor
mance-tests.md
▌Certified Kubernetes Program
ïŹhttps://www.cncf.io/certification/software-conformance/
Verify Your Kubernetes Clusters with Upstream e2e tests

More Related Content

What's hot

Reliability Testing in OPNFV
Reliability Testing in OPNFVReliability Testing in OPNFV
Reliability Testing in OPNFVOPNFV
 
Continuous delivery with jenkins pipelines @devops pro moscow
Continuous delivery with jenkins pipelines @devops pro moscow Continuous delivery with jenkins pipelines @devops pro moscow
Continuous delivery with jenkins pipelines @devops pro moscow Roman Pickl
 
Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014CloudBees
 
Turbocharge Your Automation Framework to Shorten Regression Execution Time
Turbocharge Your Automation Framework to Shorten Regression Execution TimeTurbocharge Your Automation Framework to Shorten Regression Execution Time
Turbocharge Your Automation Framework to Shorten Regression Execution TimeJosiah Renaudin
 
Bluegreenpres
BluegreenpresBluegreenpres
BluegreenpresPaul_Ladyman
 
Net app import_utility_migration_guide
Net app import_utility_migration_guideNet app import_utility_migration_guide
Net app import_utility_migration_guideRony Melo
 
Effective Testing with Ansible and InSpec
Effective Testing with Ansible and InSpecEffective Testing with Ansible and InSpec
Effective Testing with Ansible and InSpecNathen Harvey
 
Why we didn't catch that
Why we didn't catch thatWhy we didn't catch that
Why we didn't catch thatgaoliang641
 
How do you implement Continuous Delivery?: Part 5 - Deployment Patterns
How do you implement Continuous Delivery?: Part 5 - Deployment Patterns How do you implement Continuous Delivery?: Part 5 - Deployment Patterns
How do you implement Continuous Delivery?: Part 5 - Deployment Patterns ThoughtWorks Studios
 
Introduction to Test Kitchen and InSpec
Introduction to Test Kitchen and InSpecIntroduction to Test Kitchen and InSpec
Introduction to Test Kitchen and InSpecNathen Harvey
 
Effective Testing with Ansible and InSpec
Effective Testing with Ansible and InSpecEffective Testing with Ansible and InSpec
Effective Testing with Ansible and InSpecNathen Harvey
 
Agile Bodensee - Testautomation & Continuous Delivery Workshop
Agile Bodensee - Testautomation & Continuous Delivery WorkshopAgile Bodensee - Testautomation & Continuous Delivery Workshop
Agile Bodensee - Testautomation & Continuous Delivery WorkshopMichael Palotas
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To PrometheusEtienne Coutaud
 
Jenkins and Chef: Infrastructure CI and Automated Deployment
Jenkins and Chef: Infrastructure CI and Automated DeploymentJenkins and Chef: Infrastructure CI and Automated Deployment
Jenkins and Chef: Infrastructure CI and Automated DeploymentDan Stine
 
Backup Exec Partner Toolkit
Backup Exec Partner ToolkitBackup Exec Partner Toolkit
Backup Exec Partner ToolkitSymantec
 
Surviving the Script-apocalypse
Surviving the Script-apocalypseSurviving the Script-apocalypse
Surviving the Script-apocalypseDevOps.com
 
Sneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF ReleaseSneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF ReleaseNavita Sood
 
Chef Compliance & Workflow w/Delivery
Chef Compliance & Workflow w/Delivery Chef Compliance & Workflow w/Delivery
Chef Compliance & Workflow w/Delivery Chef
 
EMC Documentum - xCP 2.x Troubleshooting
EMC Documentum - xCP 2.x TroubleshootingEMC Documentum - xCP 2.x Troubleshooting
EMC Documentum - xCP 2.x TroubleshootingHaytham Ghandour
 

What's hot (20)

Reliability Testing in OPNFV
Reliability Testing in OPNFVReliability Testing in OPNFV
Reliability Testing in OPNFV
 
MidSem
MidSemMidSem
MidSem
 
Continuous delivery with jenkins pipelines @devops pro moscow
Continuous delivery with jenkins pipelines @devops pro moscow Continuous delivery with jenkins pipelines @devops pro moscow
Continuous delivery with jenkins pipelines @devops pro moscow
 
Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014
 
Turbocharge Your Automation Framework to Shorten Regression Execution Time
Turbocharge Your Automation Framework to Shorten Regression Execution TimeTurbocharge Your Automation Framework to Shorten Regression Execution Time
Turbocharge Your Automation Framework to Shorten Regression Execution Time
 
Bluegreenpres
BluegreenpresBluegreenpres
Bluegreenpres
 
Net app import_utility_migration_guide
Net app import_utility_migration_guideNet app import_utility_migration_guide
Net app import_utility_migration_guide
 
Effective Testing with Ansible and InSpec
Effective Testing with Ansible and InSpecEffective Testing with Ansible and InSpec
Effective Testing with Ansible and InSpec
 
Why we didn't catch that
Why we didn't catch thatWhy we didn't catch that
Why we didn't catch that
 
How do you implement Continuous Delivery?: Part 5 - Deployment Patterns
How do you implement Continuous Delivery?: Part 5 - Deployment Patterns How do you implement Continuous Delivery?: Part 5 - Deployment Patterns
How do you implement Continuous Delivery?: Part 5 - Deployment Patterns
 
Introduction to Test Kitchen and InSpec
Introduction to Test Kitchen and InSpecIntroduction to Test Kitchen and InSpec
Introduction to Test Kitchen and InSpec
 
Effective Testing with Ansible and InSpec
Effective Testing with Ansible and InSpecEffective Testing with Ansible and InSpec
Effective Testing with Ansible and InSpec
 
Agile Bodensee - Testautomation & Continuous Delivery Workshop
Agile Bodensee - Testautomation & Continuous Delivery WorkshopAgile Bodensee - Testautomation & Continuous Delivery Workshop
Agile Bodensee - Testautomation & Continuous Delivery Workshop
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
 
Jenkins and Chef: Infrastructure CI and Automated Deployment
Jenkins and Chef: Infrastructure CI and Automated DeploymentJenkins and Chef: Infrastructure CI and Automated Deployment
Jenkins and Chef: Infrastructure CI and Automated Deployment
 
Backup Exec Partner Toolkit
Backup Exec Partner ToolkitBackup Exec Partner Toolkit
Backup Exec Partner Toolkit
 
Surviving the Script-apocalypse
Surviving the Script-apocalypseSurviving the Script-apocalypse
Surviving the Script-apocalypse
 
Sneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF ReleaseSneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF Release
 
Chef Compliance & Workflow w/Delivery
Chef Compliance & Workflow w/Delivery Chef Compliance & Workflow w/Delivery
Chef Compliance & Workflow w/Delivery
 
EMC Documentum - xCP 2.x Troubleshooting
EMC Documentum - xCP 2.x TroubleshootingEMC Documentum - xCP 2.x Troubleshooting
EMC Documentum - xCP 2.x Troubleshooting
 

Similar to Verify Your Kubernetes Clusters with Upstream e2e tests

Viavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptxViavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptxmani723
 
Auto sre with keptn
Auto sre with keptnAuto sre with keptn
Auto sre with keptnLibbySchulze
 
Automated Virtualized Testing (AVT) with Docker, Kubernetes, WireMock and Gat...
Automated Virtualized Testing (AVT) with Docker, Kubernetes, WireMock and Gat...Automated Virtualized Testing (AVT) with Docker, Kubernetes, WireMock and Gat...
Automated Virtualized Testing (AVT) with Docker, Kubernetes, WireMock and Gat...VMware Tanzu
 
Aerospace maintenance facility increases utilization by 50%, saves money
Aerospace maintenance facility increases utilization by 50%, saves moneyAerospace maintenance facility increases utilization by 50%, saves money
Aerospace maintenance facility increases utilization by 50%, saves moneyIntelligentManufacturingInstitute
 
Technology Primer: Closing the DevOps Loop by Integrating CA Application Perf...
Technology Primer: Closing the DevOps Loop by Integrating CA Application Perf...Technology Primer: Closing the DevOps Loop by Integrating CA Application Perf...
Technology Primer: Closing the DevOps Loop by Integrating CA Application Perf...CA Technologies
 
Testlink Test Management with Teamforge
Testlink Test Management with TeamforgeTestlink Test Management with Teamforge
Testlink Test Management with TeamforgeCollabNet
 
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...Revelation Technologies
 
Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019Mike Villiger
 
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...Revelation Technologies
 
Tempest scenariotests 20140512
Tempest scenariotests 20140512Tempest scenariotests 20140512
Tempest scenariotests 20140512Masayuki Igawa
 
Continuous Delivery Agiles 2014 Medellin
Continuous Delivery Agiles 2014 MedellinContinuous Delivery Agiles 2014 Medellin
Continuous Delivery Agiles 2014 MedellinDiego Garber
 
1 DevOp vs 1.000 servers - Amazon EC2 and Chef automation intro
1 DevOp vs 1.000 servers - Amazon EC2 and Chef automation intro1 DevOp vs 1.000 servers - Amazon EC2 and Chef automation intro
1 DevOp vs 1.000 servers - Amazon EC2 and Chef automation introThomas Lobinger
 
Anatomy of a Build Pipeline
Anatomy of a Build PipelineAnatomy of a Build Pipeline
Anatomy of a Build PipelineSamuel Brown
 
Continuous Deployment of your Application @JUGtoberfest
Continuous Deployment of your Application @JUGtoberfestContinuous Deployment of your Application @JUGtoberfest
Continuous Deployment of your Application @JUGtoberfestMarcin Grzejszczak
 
JCON_15FactorWorkshop.pptx
JCON_15FactorWorkshop.pptxJCON_15FactorWorkshop.pptx
JCON_15FactorWorkshop.pptxGrace Jansen
 
Continuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOneContinuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOneciberkleid
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)CIVEL Benoit
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1CIVEL Benoit
 

Similar to Verify Your Kubernetes Clusters with Upstream e2e tests (20)

Viavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptxViavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptx
 
Auto sre with keptn
Auto sre with keptnAuto sre with keptn
Auto sre with keptn
 
Automated Virtualized Testing (AVT) with Docker, Kubernetes, WireMock and Gat...
Automated Virtualized Testing (AVT) with Docker, Kubernetes, WireMock and Gat...Automated Virtualized Testing (AVT) with Docker, Kubernetes, WireMock and Gat...
Automated Virtualized Testing (AVT) with Docker, Kubernetes, WireMock and Gat...
 
Aerospace maintenance facility increases utilization by 50%, saves money
Aerospace maintenance facility increases utilization by 50%, saves moneyAerospace maintenance facility increases utilization by 50%, saves money
Aerospace maintenance facility increases utilization by 50%, saves money
 
Technology Primer: Closing the DevOps Loop by Integrating CA Application Perf...
Technology Primer: Closing the DevOps Loop by Integrating CA Application Perf...Technology Primer: Closing the DevOps Loop by Integrating CA Application Perf...
Technology Primer: Closing the DevOps Loop by Integrating CA Application Perf...
 
Testlink Test Management with Teamforge
Testlink Test Management with TeamforgeTestlink Test Management with Teamforge
Testlink Test Management with Teamforge
 
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...
 
Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019
 
Journey toward3rdplatform
Journey toward3rdplatformJourney toward3rdplatform
Journey toward3rdplatform
 
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...
Compute Cloud Performance Showdown: 18 Months Later (OCI, AWS, IBM Cloud, GCP...
 
Tempest scenariotests 20140512
Tempest scenariotests 20140512Tempest scenariotests 20140512
Tempest scenariotests 20140512
 
Continuous Delivery Agiles 2014 Medellin
Continuous Delivery Agiles 2014 MedellinContinuous Delivery Agiles 2014 Medellin
Continuous Delivery Agiles 2014 Medellin
 
1 DevOp vs 1.000 servers - Amazon EC2 and Chef automation intro
1 DevOp vs 1.000 servers - Amazon EC2 and Chef automation intro1 DevOp vs 1.000 servers - Amazon EC2 and Chef automation intro
1 DevOp vs 1.000 servers - Amazon EC2 and Chef automation intro
 
Anatomy of a Build Pipeline
Anatomy of a Build PipelineAnatomy of a Build Pipeline
Anatomy of a Build Pipeline
 
Continuous Deployment of your Application @JUGtoberfest
Continuous Deployment of your Application @JUGtoberfestContinuous Deployment of your Application @JUGtoberfest
Continuous Deployment of your Application @JUGtoberfest
 
JCON_15FactorWorkshop.pptx
JCON_15FactorWorkshop.pptxJCON_15FactorWorkshop.pptx
JCON_15FactorWorkshop.pptx
 
Architecting Applications
Architecting ApplicationsArchitecting Applications
Architecting Applications
 
Continuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOneContinuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOne
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1
 

Recently uploaded

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 

Recently uploaded (20)

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Call Girls In Mukherjee Nagar đŸ“± 9999965857 đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar đŸ“±  9999965857  đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar đŸ“±  9999965857  đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar đŸ“± 9999965857 đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

Verify Your Kubernetes Clusters with Upstream e2e tests

  • 1. Verify Your Kubernetes Clusters with Upstream e2e Tests Kenichi Omichi - NEC Open Source Summit NA 2018 8/29/2018
  • 2. Agenda About me Upstream e2e tests Motivation to use e2e tests for production How to use e2e tests Test failures, solutions and workarounds - PV Protection tests - StatefulSet tests - HPA tests - MetricsGrabber tests - Resource usage tracking tests Summary
  • 3. 3 © NEC Corporation 2018 About me ▌Kenichi Omichi from NEC ▌Software engineer ▌Open source contributor ▌OpenStack ïŹOpenStack/QA core developer, ex-PTL (Project Team Lead) ïŹOpenStack/Nova core developer ▌Kubernetes ïŹEstablished community member ïŹCKA (Certified Kubernetes Administrator)
  • 4. 4 © NEC Corporation 2018 Upstream e2e tests ▌Test end-to-end (e2e) behavior of system level ïŹUnit tests cannot catch unforeseen changes at system level sometimes ▌CI for pull requests ïŹPull requests need to pass subset of e2e tests before merging ▌e2e tests continue increasing ïŹThe latest k8s(for v1.12) contains 1033 tests 600 700 800 900 1000 1100 v1.7 v1.8 v1.9 v1.10 v1.11 master e2e specs
  • 5. 5 © NEC Corporation 2018 Motivation to use e2e tests for production ▌k8s contains a lot of features and a lot of corresponding e2e tests ïŹE2e test number is still increasing ▌e2e tests are useful for verifying k8s clusters ïŹWe can find misconfigurations before productions ïŹRarely you face test failures ▌Investigations of test failures are good chances to learn k8s deeply ïŹWe can understand overview/detail of k8s features and test expectations
  • 6. 6 © NEC Corporation 2018 How to use e2e tests – Build e2e test binary ▌Check version of target k8s cluster ▌Download k8s source code and checkout same version ▌Build e2e test binary $ go get k8s.io/kubernetes $ cd $GOPATH/src/k8s.io/kubernetes $ git checkout refs/tags/v1.11.1 $ kubectl version 
 Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", $ sudo /usr/local/go/bin/go run hack/e2e.go -- --build
  • 7. 7 © NEC Corporation 2018 How to use e2e tests – Run all e2e tests ▌Choose provider type for your cloud platform: ïŹskeleton, local, gce, gke, aws ▌(skeleton) Export KUBE_MASTER_IP and KUBE_MASTER ▌Run all e2e tests $ export KUBE_MASTER_IP="192.168.1.108:6443" $ export KUBE_MASTER=k8s-master $ export KUBECONFIG=$HOME/admin.conf $ go run hack/e2e.go -- --provider=skeleton --test
  • 8. 8 © NEC Corporation 2018 How to use e2e tests – Run Conformance tests ▌What are conformance tests ïŹA subset of e2e tests (167/999 in v1.11) ïŹCorresponding tests for interoperable features sig-architecture has approved ïŹk8s community expects to pass on any k8s clusters ïŹTo get certification of Certified Kubernetes program, every vendor needs to pass conformance tests ▌Run conformance tests $ export KUBECONFIG=$HOME/admin.conf $ export KUBERNETES_CONFORMANCE_TEST=true $ go run hack/e2e.go -- --provider=skeleton --test --test_args="--ginkgo.focus=[Conformance]"
  • 9. 9 © NEC Corporation 2018 We would face many failures ▌Conformance tests can be operated easily ïŹJust 167 tests of 999 all e2e tests ïŹThese tests have been selected by expecting to be passed on any clusters ïŹIt takes 1.5 hours only ▌But a little difficult to run ALL e2e tests sucessully ïŹIt could take 12 hours if many failures ïŹMuch timeout without any configuration, retry-number and timeout are hardcord- ed in test code There is a lot of points we need to take care to use e2e tests for productions
  • 10. Test failures, Solutions and Workarounds
  • 11. 11 © NEC Corporation 2018 Test environment ▌HW ïŹDell desktop PCs * 2: Core i5-6500 CPU @3.2GHz, 16GB MEM ïŹIntel NUC PCs *2: Core i3-7100 @2.4GHz, 32GB MEM ▌IaaS: OpenStack Queens ïŹOS: Ubuntu 16.04 LTS ïŹUbuntu package ▌Kubernetes v1.11.1 ïŹ2 VMs(master, node) on top of IaaS ïŹOS: Ubuntu 16.04 LTS ïŹInstall tool: kubeadm ïŹNetwork: Flannel CTRLCPU01 CPU02CPU03 OpenStack IaaS master node
  • 12. 12 © NEC Corporation 2018 Failure case 1: PV Protection tests ▌2 PV Protection tests were failed ▌Root Cause ïŹTests were failed to check PV Protection flag in PV due to nonexistent flag ïŹThe flag is set after the PV phase becomes “Available” from “Pending”, but tests didn’t check the phase ïŹe2e test machine was faster than k8s cluster, so the flag was nonexistent ▌Solution ïŹAdd waiting for “Avaiable” phase before checking the flag in tests ïŹPR: https://github.com/kubernetes/kubernetes/pull/67520 (Merged 8/20/2018) Verify "immediate" deletion of a PV that is not bound to a PVC Verify that PV bound to a PVC is not removed immediately Persistent Volume phase: Pending -> Available finalizers: kubernetes.io/pv- protection
  • 13. 13 © NEC Corporation 2018 You might want to backport the PR to e2e ▌But e2e doesn’t start if changing e2e code due to version difference ▌Specify --check-version-skew=false to skip version check $ go run hack/e2e.go -- --provider=skeleton --test error: server version (&version.Info{Major:"1", Minor:"11", ..) differs from client version (version.Info{Major:"1", Minor:"11+", ..) exit status 1 $ go run hack/e2e.go -- --provider=skeleton –test --check-version-skew=false Client Version: version.Info{Major:"1", Minor:"11+", ... Server Version: version.Info{Major:"1", Minor:"11", ... ... Running Suite: Kubernetes e2e suite
  • 14. 14 © NEC Corporation 2018 Failure case 2: StatefulSet tests ▌3 StatefulSet tests were failed ▌Root Cause ïŹSince v1.11 k/k in-tree openstack-cloud-provider is deprecated, and the external openstack-cloud-provider is recommended ïŹDocument issue: Tried using the external one for PersistentVolume, but misconfigured StorageClass because StorageClass example was written for the internal one ▌Solution ïŹSpecify “openstack.org/standalone-cinder” instead of “kubernetes.io/cinder” as the provisioner in StorageClass ïŹPR: https://github.com/kubernetes/cloud-provider-openstack/pull/261 (Merged) should adopt matching orphans and release non-matching pods should not deadlock when a pod's predecessor fails should provide basic identity
  • 15. 15 © NEC Corporation 2018 Failure case 3: HPA tests - 1 ▌8 HPA(Horizontal Pod Autoscaling) tests were failed ▌How does HPA work [1] ReplicationController light [It] Should scale from 1 pod to 2 pods [2] ReplicationController light [It] Should scale from 2 pods to 1 pod [3] Should scale from 1 pod to 3 pods and from 3 to 5 [4] Should scale from 5 pods to 3 pods and from 3 to 1 [5] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5 [6] ReplicaSet [It] Should scale from 5 pods to 3 pods and from 3 to 1 [7] ReplicationController [It] Should scale from 1 pod to 3 pods and from 3 to 5 and verify decision stability [8] ReplicationController [It] Should scale from 5 pods to 3 pods and from 3 to 1 and verify decision stability HPA RC/RS/Deployment scale metrics.k8s.io API pod pod pod
 fetchupdate
  • 16. 16 © NEC Corporation 2018 Failure case 3: HPA tests - 2 ▌Root Cause ïŹHPA feature requires metrics-server to fetch CPU usage since k8s v1.9, but I didn’t deploy it ▌Solution ïŹDeploy metrics-server HPA metrics.k8s.io API fetch metrics-server provide $ git clone https://github.com/kubernetes-incubator/metrics-server $ cd metrics-server/ $ kubectl create -f deploy/1.8+/ kubelet kubelet kubelet fetch
  • 17. 17 © NEC Corporation 2018 Failure case 3: HPA tests - 3 ▌Still 6 HPA(Horizontal Pod Autoscaling) tests were failed ▌What does test [3] expect the result in autoscaling 1 to 3 pods [1] ReplicationController light [It] Should scale from 1 pod to 2 pods [2] ReplicationController light [It] Should scale from 2 pods to 1 pod [3] Should scale from 1 pod to 3 pods and from 3 to 5 [4] Should scale from 5 pods to 3 pods and from 3 to 1 [5] Deployment [It] Should scale from 1 pod to 3 pods and from 3 to 5 [6] ReplicaSet [It] Should scale from 5 pods to 3 pods and from 3 to 1 [7] ReplicationController [It] Should scale from 1 pod to 3 pods and from 3 to 5 and verify decision stability [8] ReplicationController [It] Should scale from 5 pods to 3 pods and from 3 to 1 and verify decision stability HPA RC/RS/Deployment scale pod Solved 1. Set target cpu: 20% 2. Make 50% CPU workload 3. Scale to 3 pods: 50% / 20% = 2.5 pod pod
  • 18. 18 © NEC Corporation 2018 Failure case 3: HPA tests - 4 ▌Root Cause ïŹActual CPU workload (100%) was bigger than expected (50%) ïŹThen HPA scaled pods to 5 instead of expected 3: 100% / 20% = 5 ïŹCPU workload is generated by many resource consumer processes. Each process tried to make 2% CPU workload and 25 processes existed, but it is hard to make such small workload and actual workload was 4%: Test expectation: 50% = 2% * 25 processes Actual workload: 100% = 4% * 25 processes ▌Solution ïŹChange the CPU workload of each process to 10% for the test stability 50% = 10% * 2 processes + 5 * processes ïŹPR: https://github.com/kubernetes/kubernetes/pull/67053 (Merged 8/7/2018) ïŹAll HPA tests have passed successfully
  • 19. 19 © NEC Corporation 2018 Failure case 4: MetricsGrabber test ▌1 MetricsGrabber test was failed ïŹk8s API call in the test was failed ▌Root Cause ïŹkube-scheduler listens 10251/tcp locally(127.0.0.1) and it denies connections to its node ip-address. ▌Workaround ïŹChange listening ip-address to the host address in /etc/kubernetes/manifests/kube- scheduler.yaml ïŹIssue: https://github.com/kubernetes/kubernetes/issues/67685 should grab all metrics from a Scheduler. GET /v1/namespaces/kube-system/pods/ kube-scheduler-k8s-master:10251/proxy/metrics We can get to know what APIs are called in tests with “-v=8”: $ go run hack/e2e.go -- --test --test_args="-v=8"
  • 20. 20 © NEC Corporation 2018 Failure case 5: Resource usage tracking tests - 1 ▌2 Resource usage tracking tests were failed ïŹTests expect CPU usage of kubelet is under 10%, but it was over 12% ▌Root Cause ïŹk8s VMs were running on NUC PCs ïŹCPU speed is not enough ‱ NUC PCs: i3-7100 @2.4GHz ‱ Desktop PCs: i5-6500 CPU @3.2GHz resource tracking for 0 pods per node resource tracking for 100 pods per node CTRLCPU01 CPU02CPU03 master node
  • 21. 21 © NEC Corporation 2018 Failure case 5: Resource usage tracking tests - 2 ▌Solution ïŹMigrate VMs to another host which has better CPUs ïŹQuestion: Isn’t such test environment supported with e2e test? ïŹIssue: https://github.com/kubernetes/kubernetes/issues/67621 CTRLCPU01 CPU02CPU03 master node
  • 22. 22 © NEC Corporation 2018 Summary ▌Comformance tests are useful for verifying basic functions ▌All e2e tests are useful for verifying extended functions (HPA, StatefulSets, etc.) ïŹWe can find misconfigurations before productions ïŹRarely you face test failures ▌Investigations of test failures are good chance to learn k8s deeply ïŹWe can understand overview/detail of k8s features and test expectations ▌Great to bring test issues up to community ïŹFew tests can run on some environments (upstream CI) ïŹDiscuss test results which are gotten from different environments, we can make tests more stable ïŹThat is great contribution to community by making development smooth
  • 23. 23 © NEC Corporation 2018 References ▌e2e tests ïŹhttps://github.com/kubernetes/community/blob/master/contributors/devel/e2e- tests.md ▌Conformance tests ïŹhttps://github.com/kubernetes/community/blob/master/contributors/devel/confor mance-tests.md ▌Certified Kubernetes Program ïŹhttps://www.cncf.io/certification/software-conformance/