1. Prometheus on NKS 가이드 문서
📌QA test Region on (KR / 한국)
https://github.com/sysnet4admin
2. Helm v3.10.3 설치
1.helm binary 설치 확인 (헬름 설치가 안되 있는 경우 설치를 우선 진행)
root@k8s-console:~# helm version
WARNING: Kubernetes configuration file is group-readable. This is
insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is
insecure. Location: /root/.kube/config
version.BuildInfo{Version:"v3.10.3",
GitCommit:"835b7334cfe2e5e27870ab3ed4135f136eecc704",
GitTreeState:"clean", GoVersion:"go1.18.9"}
❗만약 insecure 메시지를 보고 싶지 않다면...
root@k8s-console:~# chmod 700 ~/.kube/config
root@k8s-console:~# helm version --short
v3.10.3+g835b733
헬름을 통한 Prometheus 배포를 위한 사전 작업
1.프로메테우스 설치를 위한 헬름 레포를 추가
root@k8s-console:~# helm repo add prometheus-community
https://prometheus-community.github.io/helm-charts
"prometheus-community" has been added to your repositories
2.레포에서 최신 내용을 받아 업데이트
root@k8s-console:~# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "prometheus-community" chart
repository
Update Complete. ⎈Happy Helming!⎈
3.사전 구성된 스토리지클래스 확인
root@k8s-console:~# kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nks-block-storage (default) blk.csi.ncloud.com Delete WaitForFirstConsumer true 17d
nks-nas-csi nas.csi.ncloud.com Delete WaitForFirstConsumer true 17d
3. Prometheus 배포
1.헬름을 통해서 NKS에 프로메테우스 배포
root@k8s-console:~# helm install prometheus
prometheus-community/prometheus
--set server.service.type="LoadBalancer"
--namespace=monitoring
--create-namespace
WARNING: Kubernetes configuration file is group-readable. This is
insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is
insecure. Location: /root/.kube/config
NAME: prometheus
LAST DEPLOYED: Sat Dec 17 17:03:41 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
The Prometheus server can be accessed via port 80 on the following DNS
name from within your cluster:
prometheus-server.monitoring.svc.cluster.local
Get the Prometheus server URL by running these commands in the same
shell:
NOTE: It may take a few minutes for the LoadBalancer IP to be
available.
You can watch the status of by running 'kubectl get svc
--namespace monitoring -w prometheus-server'
export SERVICE_IP=$(kubectl get svc --namespace monitoring
prometheus-server -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SERVICE_IP:80
The Prometheus alertmanager can be accessed via port on the following
DNS name from within your cluster:
prometheus-%!s(<nil>).monitoring.svc.cluster.local
Get the Alertmanager URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace monitoring -l
"app=prometheus,component=" -o jsonpath="{.items[0].metadata.name}")
4. kubectl --namespace monitoring port-forward $POD_NAME 9093
########################################################################
#########
###### WARNING: Pod Security Policy has been disabled by default since
#####
###### it deprecated after k8s 1.25+. use
#####
###### (index .Values "prometheus-node-exporter" "rbac"
#####
###### . "pspEnabled") with (index .Values
#####
###### "prometheus-node-exporter" "rbac" "pspAnnotations")
#####
###### in case you still need it.
#####
########################################################################
#########
The Prometheus PushGateway can be accessed via port 9091 on the
following DNS name from within your cluster:
prometheus-prometheus-pushgateway.monitoring.svc.cluster.local
Get the PushGateway URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace monitoring -l
"app=prometheus-pushgateway,component=pushgateway" -o
jsonpath="{.items[0].metadata.name}")
kubectl --namespace monitoring port-forward $POD_NAME 9091
For more information on running Prometheus, visit:
https://prometheus.io/
❗만약 storageclass를 nks-block-storage가 아닌 다른 스토리지를 쓰고 싶다면 다음을
참조하세요
helm install prometheus prometheus-community/prometheus
--set alertmanager.persistentVolume.storageClass="nks-block-storage"
--set server.persistentVolume.storageClass="nks-block-storage"
--set server.service.type="LoadBalancer"
--namespace=monitoring
--create-namespace
5. 2.배포된 pods와 services 확인
root@k8s-console:~# kubectl get po,svc -n monitoring
NAME READY STATUS RESTARTS AGE
pod/prometheus-alertmanager-0 1/1 Running 0 3m37s
pod/prometheus-kube-state-metrics-7cdcf7cc98-rsgcr 1/1 Running 0 3m37s
pod/prometheus-prometheus-node-exporter-5qpn4 1/1 Running 0 3m37s
pod/prometheus-prometheus-pushgateway-959d84d7f-8ztlm 1/1 Running 0 3m37s
pod/prometheus-server-54956c9cfb-wlvms 2/2 Running 0 3m37s
NAME TYPE CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
service/prometheus-alertmanager ClusterIP 198.19.133.139 <none>
9093/TCP 3m38s
service/prometheus-alertmanager-headless ClusterIP None <none>
9093/TCP 3m38s
service/prometheus-kube-state-metrics ClusterIP 198.19.185.119 <none>
8080/TCP 3m37s
service/prometheus-prometheus-node-exporter ClusterIP 198.19.252.64 <none>
9100/TCP 3m37s
service/prometheus-prometheus-pushgateway ClusterIP 198.19.193.200 <none>
9091/TCP 3m37s
service/prometheus-server LoadBalancer 198.19.178.17
monitoring-prometheus-se-18ca9-15174488-e4dd7137207d.kr.lb.naverncp.com 80:32534/TCP 3m38s
3.배포된 프로메테우스 확인
6. 4.조회된 메트릭 데이터 확인
5.배포된 프로메테우스 조회 및 삭제
root@k8s-console:~# helm list -n monitoring
NAME NAMESPACE REVISION UPDATED
STATUS CHART APP VERSION
prometheus monitoring 1 2022-12-17 17:03:41.29034263
+0900 KST deployed prometheus-19.0.2 v2.40.5
root@k8s-console:~# helm uninstall prometheus -n monitoring
release "prometheus" uninstalled
6.삭제된 프로메테우스 리소스 확인
root@k8s-console:~# helm list -n monitoring
NAME NAMESPACE REVISION UPDATED STATUS CHART APP
VERSION
root@k8s-console:~#
root@k8s-console:~# kubectl get po,svc -n monitoring
No resources found in monitoring namespace.
7. Kube Prometheus Stack (이하 프로메테우스 스택) 배포
1.헬름을 통해서 NKS에 프로메테우스 스택 배포
root@k8s-console:~# helm install kube-prometheus-stack
prometheus-community/kube-prometheus-stack
--set prometheus.service.type=LoadBalancer
--set grafana.service.type=LoadBalancer
--namespace=monitoring
--create-namespace
NAME: kube-prometheus-stack
LAST DEPLOYED: Sat Dec 17 17:14:15 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l
"release=kube-prometheus-stack"
Visit https://github.com/prometheus-operator/kube-prometheus for
instructions on how to create & configure Alertmanager and Prometheus
instances using the Operator.
2.배포된 pods와 services 확인
root@k8s-console:~# kubectl get po,svc -n monitoring
NAME READY STATUS RESTARTS AGE
pod/alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 1 (104s ago) 105s
pod/kube-prometheus-stack-grafana-77fd7cc8ff-57tp5 3/3 Running 0 114s
pod/kube-prometheus-stack-kube-state-metrics-579bf68b5-rj5ff 1/1 Running 0 114s
pod/kube-prometheus-stack-operator-64bc8bd9fd-2ggrs 1/1 Running 0 114s
pod/kube-prometheus-stack-prometheus-node-exporter-rv8b5 1/1 Running 0 115s
pod/prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 105s
NAME TYPE CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
service/alertmanager-operated ClusterIP None <none>
9093/TCP,9094/TCP,9094/UDP 105s
service/kube-prometheus-stack-alertmanager ClusterIP 198.19.250.205 <none>
9093/TCP 115s
service/kube-prometheus-stack-grafana LoadBalancer 198.19.171.157
monitoring-kube-promethe-4b1de-15174529-f0806941ff3d.kr.lb.naverncp.com 80:31512/TCP
115s
service/kube-prometheus-stack-kube-state-metrics ClusterIP 198.19.173.244 <none>
8080/TCP 115s
service/kube-prometheus-stack-operator ClusterIP 198.19.134.58 <none>
443/TCP 115s
service/kube-prometheus-stack-prometheus LoadBalancer 198.19.233.72
monitoring-kube-promethe-5d777-15174528-c0eedcb927a3.kr.lb.naverncp.com 9090:32176/TCP
8. 115s
service/kube-prometheus-stack-prometheus-node-exporter ClusterIP 198.19.202.67 <none>
9100/TCP 115s
service/prometheus-operated ClusterIP None <none>
9090/TCP 105s
❗현재 프로메테우스 스택의 큰 문제점 ?
프로메테우스 배포에는 다음과 같이 default로 storageclass(nks-block-storage)를 통해서
pv와 pvc가 생성됩니다.
root@k8s-console:~# kubectl get pv -n monitoring
CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM
STORAGECLASS REASON AGE
pvc-0d5a8305acee499e8a0d57245a 10Gi RWO Delete Bound
monitoring/storage-prometheus-alertmanager-0 nks-block-storage 9m42s
pvc-6ae9e2442da2475295da9b1050 10Gi RWO Delete Bound
monitoring/prometheus-server nks-block-storage 9m44s
root@k8s-console:~# kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY
ACCESS MODES STORAGECLASS AGE
prometheus-server Bound pvc-6ae9e2442da2475295da9b1050 10Gi
RWO nks-block-storage 10m
storage-prometheus-alertmanager-0 Bound pvc-0d5a8305acee499e8a0d57245a 10Gi
RWO nks-block-storage 10m
그러나 프로메테우스 스택에서 storageclass를 지정해 주지 않으면 다음과 같이 pv,pvc를
이용하는 것이 아니라 emptyDir를 이용해서 임시로만 사용하도록 배포 됩니다.
root@k8s-console:~# kubectl get pv,pvc -n monitoring | grep
prometheus-server
root@k8s-console:~#
root@k8s-console:~# kubectl get po -n monitoring
prometheus-kube-prometheus-stack-prometheus-0 -o yaml | grep volumes
-A30
volumes:
- name: config
secret:
defaultMode: 420
secretName: prometheus-kube-prometheus-stack-prometheus
- name: tls-assets
projected:
9. defaultMode: 420
sources:
- secret:
name: prometheus-kube-prometheus-stack-prometheus-tls-assets-0
- emptyDir: {}
name: config-out
- configMap:
defaultMode: 420
name: prometheus-kube-prometheus-stack-prometheus-rulefiles-0
name: prometheus-kube-prometheus-stack-prometheus-rulefiles-0
- name: web-config
secret:
defaultMode: 420
secretName: prometheus-kube-prometheus-stack-prometheus-web-config
- emptyDir: {}
name: prometheus-kube-prometheus-stack-prometheus-db
- name: kube-api-access-g8rvd
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
<snipped>
따라서 현업 관점에서는 storageclass가 사용되도록 설정을 해줘야 하며, 이는
value.yaml을 통해서 추가 설정 배포 되어야 합니다. (또는 차트를 fork하고 새로 고쳐야함)
이는 다음의 링크를 참조하시기 바랍니다.
프로메테우스: https://github.com/prometheus-community/helm-charts/issues/186
그라파나: https://github.com/prometheus-community/helm-charts/issues/436
헬름value관련:
https://helm.sh/docs/intro/using_helm/#customizing-the-chart-before-installing
만약 정말하고 싶다면….부록1을 참고하세요
10. 3.배포된 프로메테우스 확인
❗scapeInterval 시간을 배포 후에 변경하기를 원한다면
$ kubectl get prometheus -n monitoring -o yaml | nl | grep scrap
57 scrapeInterval: 30s
$ kubectl edit prometheus -n monitoring
prometheus.monitoring.coreos.com/kube-prometheus-stack-prometheus edited
$ kubectl get prometheus -n monitoring -o yaml | nl | grep scrap
57 scrapeInterval: 2m
11. 4.배포된 그라파나 확인 및 로그인
ID: admin
Password: prom-operator
5.미리 설정된 데이터 소스가 프로메테우스인지 확인
12. 6. 미리 만들어진 대시보드를 불러오기 위해 13770을 import 메뉴에
입력
7.Data Source를 프로메테우스로 선택하고 import 누름
13. 8.import 된 13770을 감상 및 N/A와 No data 수정
9.(필요시) 배포된 프로메테우스 스택 조회 및 삭제
root@k8s-console:~# helm list -n monitoring
NAME NAMESPACE REVISION UPDATED
STATUS CHART APP VERSION
kube-prometheus-stack monitoring 1 2022-12-17 17:14:15.264607955
+0900 KST deployed kube-prometheus-stack-43.1.1 0.61.1
root@k8s-console:~# helm uninstall -n monitoring kube-prometheus-stack
release "kube-prometheus-stack" uninstalled
14. 부록1
1.helm inspect로 values 파일 생성
$ helm inspect values prometheus-community/kube-prometheus-stack
--version 43.1.1 > kube-prometheus-stack-43.1.1.values
2. 생성된 values 파일에 필요 내용 추가 및 수정
라인 번호는 실행 시점 및 수정 순서에 따라 다소 차이가 있을 수도 있습니다.
참고로 라인 번호는 vi 실행 이후에 :set nu로 표시할 수 있습니다.
수정
542 ## Storage is the definition of how storage will be used by the
Alertmanager instances.
543 ## ref:
https://github.com/prometheus-operator/prometheus-operator/blob/main/Doc
umentation/user-guides/storage.md
544 ##
545 storage:
546 volumeClaimTemplate:
547 spec:
548 storageClassName: nks-block-storage
549 accessModes: ["ReadWriteOnce"]
550 resources:
551 requests:
552 storage: 50Gi
553 # selector: {}
추가
697 ## Using default values from
https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.y
aml
698 ##
699 grafana:
700 enabled: true
701 namespaceOverride: ""
702
703 # override configuration by hoon
704 persistence:
705 enabled: true
706 type: pvc
15. 707 storageClassName: nks-block-storage
708 accessModes:
709 - ReadWriteOnce
710 size: 100Gi
711 finalizers:
712 - kubernetes.io/pvc-protection
수정
726 ## Timezone for the default dashboards
727 ## Other options are: browser or a specific timezone, i.e.
Europe/Luxembourg
728 ##
729 defaultDashboardsTimezone: utc
730
731 adminPassword: admin
732
수정
2580 ## Prometheus StorageSpec for persistent data
2581 ## ref:
https://github.com/prometheus-operator/prometheus-operator/blob/main/Doc
umentation/user-guides/storage.md
2582 ##
2583 storageSpec:
2584 ## Using PersistentVolumeClaim
2585 ##
2586 volumeClaimTemplate:
2587 spec:
2588 storageClassName: nks-block-storage
2589 accessModes: ["ReadWriteOnce"]
2590 resources:
2591 requests:
2592 storage: 50Gi
2593 # selector: {}
3.helm install 실행
root@k8s-console:~# helm install
prometheus-community/kube-prometheus-stack
16. --set prometheus.service.type=LoadBalancer
--set grafana.service.type=LoadBalancer
--create-namespace
--namespace monitoring
--generate-name
--values kube-prometheus-stack-43.1.1.values
NAME: kube-prometheus-stack-1671267408
LAST DEPLOYED: Sat Dec 17 17:56:49 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l
"release=kube-prometheus-stack-1671267408"
Visit https://github.com/prometheus-operator/kube-prometheus for
instructions on how to create & configure Alertmanager and Prometheus
instances using the Operator.
4.변경된 값이 있는 values를 통해서 생성된 프로메테우스 스택 확인
root@k8s-console:~# kubectl get po,svc,pv,pvc -n monitoring
NAME READY STATUS RESTARTS AGE
pod/alertmanager-kube-prometheus-stack-1671-alertmanager-0 2/2 Running 1 (24s ago) 36s
pod/kube-prometheus-stack-1671-operator-696ddf996d-2tbft 1/1 Running 0 37s
pod/kube-prometheus-stack-1671267408-grafana-75cf5cff79-hrs59 3/3 Running 0 37s
pod/kube-prometheus-stack-1671267408-kube-state-metrics-7b44cdrf8q9 1/1 Running 0 37s
pod/kube-prometheus-stack-1671267408-prometheus-node-exporter-npmpk 1/1 Running 0 37s
pod/prometheus-kube-prometheus-stack-1671-prometheus-0 2/2 Running 0 35s
NAME TYPE CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
service/alertmanager-operated ClusterIP None <none>
9093/TCP,9094/TCP,9094/UDP 36s
service/kube-prometheus-stack-1671-alertmanager ClusterIP 198.19.141.183 <none>
9093/TCP 37s
service/kube-prometheus-stack-1671-operator ClusterIP 198.19.249.190 <none>
443/TCP 37s
service/kube-prometheus-stack-1671-prometheus LoadBalancer 198.19.189.46
monitoring-kube-promethe-94513-15174705-1fbb6ff1467d.kr.lb.naverncp.com 9090:30008/TCP 37s
service/kube-prometheus-stack-1671267408-grafana LoadBalancer 198.19.206.4 <pending>
80:31398/TCP 37s
service/kube-prometheus-stack-1671267408-kube-state-metrics ClusterIP 198.19.225.152 <none>
8080/TCP 37s
service/kube-prometheus-stack-1671267408-prometheus-node-exporter ClusterIP 198.19.191.119 <none>
9100/TCP 37s
service/prometheus-operated ClusterIP None <none>
9090/TCP 35s
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM
STORAGECLASS REASON AGE
persistentvolume/pvc-7c195a1da23d4755b21b6ed2db 50Gi RWO Delete Bound
monitoring/prometheus-kube-prometheus-stack-1671-prometheus-db-prometheus-kube-prometheus-stack-1671-prometheus-0
nks-block-storage 33s
persistentvolume/pvc-8c1c8c896efb40b6af8fe82a42 50Gi RWO Delete Bound
monitoring/alertmanager-kube-prometheus-stack-1671-alertmanager-db-alertmanager-kube-prometheus-stack-1671-alertma