容器编排系统K8s之Prometheus监控系统+Grafana部署

  • A+
所属分类:linux技术
摘要

  前文我们聊到了k8s的apiservice资源结合自定义apiserver扩展原生apiserver功能的相关话题,回顾请参考:https://www.cnblogs.com/qiuhom-1874/p/14279850.html;今天我们来聊一聊监控k8s集群相关话题;

  前文我们聊到了k8s的apiservice资源结合自定义apiserver扩展原生apiserver功能的相关话题,回顾请参考:https://www.cnblogs.com/qiuhom-1874/p/14279850.html;今天我们来聊一聊监控k8s集群相关话题;

  前文我们使用自定义apiserver metrics server扩展了原生apiserver的功能,让其原生apiserver能够通过kubectl top node/pod 命令来获取对应节点或名称空间下pod的cpu和内存指标数据;这些指标数据在一定程度上能够让我们清楚的知道对应pod或节点资源使用情况,本质上这也是一种监控方式;但是metrics server 采集的数据只有内存和cpu指标数据,在一定程度上不能满足我们了解节点或pod的其他数据;这样一来我们就需要有一款专业的监控系统来帮助我们监控k8s集群节点或pod;Prometheus是一款高性能的监控程序,其内部主要有3个组件,Retrieval组件主要负责数据收集工作,它可以结合外部其他程序收集数据;TSDB组件主要是用来存储指标数据,该组件是一个时间序列存储系统;HttpServer组件主要用来对外提供restful api接口,为客户端提供查询接口;默认监听在9090端口;

  prometheus监控系统整体top

容器编排系统K8s之Prometheus监控系统+Grafana部署

  提示:上图是Prometheus监控系统的top图;Pushgateway组件类似Prometheus retrieval代理,它主要负责收集主动推送指标数据的pod的指标数据,在Prometheus 监控系统中也有主动监控和被动监控的概念,主动监控是指被监控端主动推送数据到server,被动监控是指被监控端被动等待server来拉去数据,默认情况Prometheus是工作为被动监控模式,即server主动到被监控端采集数据;节点级别metrics 数据可以使用node-exporter来收集,当然node-exporter也可以收集pod容器里的指标数据;alertmanager主要用来为Prometheus监控系统提供告警功能;Prometheus web ui主要作用是为其提供一个web查询页面;

  Prometheus 监控系统组件

  kube-state-metrics:该组件主要用来为监控k8s集群中的指标数据提供计数能力;比如k8s节点有几个,pod的数量等等;

  node-exporter:该组件主要作用是用来收集对应节点上的指标数据;

  alertmanager:该组件主要用来为Prometheus监控系统提供告警功能;

  prometheus-server:该组件主要用来存储指标数据,处理指标数据,以及为用户提供一个restful api查询接口;

  控制pod能够被Prometheus抓取数据的注解信息

  prometheus.io/scrape:该注解信息主要用来描述对应pod是否允许抓取指标数据,true表示允许,false表示不允许;

  prometheus.io/path:用于描述抓取指标数据使用的url路径,一般为/metrics

  prometheus.io/port:用于描述对应抓取指标数据使用的端口信息;

  部署Prometheus监控系统

  1、部署kube-state-metrics

  创建kube-state-metrics rbac授权相关清单

[root@master01 kube-state-metrics]# cat kube-state-metrics-rbac.yaml  apiVersion: v1 kind: ServiceAccount metadata:   name: kube-state-metrics   namespace: kube-system   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata:   name: kube-state-metrics   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: [""]   resources:   - configmaps   - secrets   - nodes   - pods   - services   - resourcequotas   - replicationcontrollers   - limitranges   - persistentvolumeclaims   - persistentvolumes   - namespaces   - endpoints   verbs: ["list", "watch"] - apiGroups: ["extensions","apps"]   resources:   - daemonsets   - deployments   - replicasets   verbs: ["list", "watch"] - apiGroups: ["apps"]   resources:   - statefulsets   verbs: ["list", "watch"] - apiGroups: ["batch"]   resources:   - cronjobs   - jobs   verbs: ["list", "watch"] - apiGroups: ["autoscaling"]   resources:   - horizontalpodautoscalers   verbs: ["list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata:   name: kube-state-metrics-resizer   namespace: kube-system   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: [""]   resources:   - pods   verbs: ["get"] - apiGroups: ["extensions","apps"]   resources:   - deployments   resourceNames: ["kube-state-metrics"]   verbs: ["get", "update"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata:   name: kube-state-metrics   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile roleRef:   apiGroup: rbac.authorization.k8s.io   kind: ClusterRole   name: kube-state-metrics subjects: - kind: ServiceAccount   name: kube-state-metrics   namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata:   name: kube-state-metrics   namespace: kube-system   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile roleRef:   apiGroup: rbac.authorization.k8s.io   kind: Role   name: kube-state-metrics-resizer subjects: - kind: ServiceAccount   name: kube-state-metrics   namespace: kube-system [root@master01 kube-state-metrics]#  

  提示:上述清单主要创建了一个sa用户,和两个角色,并将sa用户绑定之对应的角色上;让其对应sa用户拥有对应角色的相关权限;

  创建kube-state-metrics service配置清单

[root@master01 kube-state-metrics]# cat kube-state-metrics-service.yaml  apiVersion: v1 kind: Service metadata:   name: kube-state-metrics   namespace: kube-system   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile     kubernetes.io/name: "kube-state-metrics"   annotations:     prometheus.io/scrape: 'true' spec:   ports:   - name: http-metrics     port: 8080     targetPort: http-metrics     protocol: TCP   - name: telemetry     port: 8081     targetPort: telemetry     protocol: TCP   selector:     k8s-app: kube-state-metrics [root@master01 kube-state-metrics]#  

  创建kube-state-metrics 部署清单

[root@master01 kube-state-metrics]# cat kube-state-metrics-deployment.yaml  apiVersion: apps/v1 kind: Deployment metadata:   name: kube-state-metrics   namespace: kube-system   labels:     k8s-app: kube-state-metrics     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile     version: v2.0.0-beta spec:   selector:     matchLabels:       k8s-app: kube-state-metrics       version: v2.0.0-beta   replicas: 1   template:     metadata:       labels:         k8s-app: kube-state-metrics         version: v2.0.0-beta     spec:       priorityClassName: system-cluster-critical       serviceAccountName: kube-state-metrics       containers:       - name: kube-state-metrics         image: quay.io/coreos/kube-state-metrics:v2.0.0-beta         ports:         - name: http-metrics           containerPort: 8080         - name: telemetry           containerPort: 8081         readinessProbe:           httpGet:             path: /healthz             port: 8080           initialDelaySeconds: 5           timeoutSeconds: 5       - name: addon-resizer         image: k8s.gcr.io/addon-resizer:1.8.7         resources:           limits:             cpu: 100m             memory: 30Mi           requests:             cpu: 100m             memory: 30Mi         env:           - name: MY_POD_NAME             valueFrom:               fieldRef:                 fieldPath: metadata.name           - name: MY_POD_NAMESPACE             valueFrom:               fieldRef:                 fieldPath: metadata.namespace         volumeMounts:           - name: config-volume             mountPath: /etc/config         command:           - /pod_nanny           - --config-dir=/etc/config           - --container=kube-state-metrics           - --cpu=100m           - --extra-cpu=1m           - --memory=100Mi           - --extra-memory=2Mi           - --threshold=5           - --deployment=kube-state-metrics       volumes:         - name: config-volume           configMap:             name: kube-state-metrics-config --- # Config map for resource configuration. apiVersion: v1 kind: ConfigMap metadata:   name: kube-state-metrics-config   namespace: kube-system   labels:     k8s-app: kube-state-metrics     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile data:   NannyConfiguration: |-     apiVersion: nannyconfig/v1alpha1     kind: NannyConfiguration  [root@master01 kube-state-metrics]#  

  应用上述三个清单,部署kube-state-metrics组件

[root@master01 kube-state-metrics]# ls kube-state-metrics-deployment.yaml  kube-state-metrics-rbac.yaml  kube-state-metrics-service.yaml [root@master01 kube-state-metrics]# kubectl apply -f . deployment.apps/kube-state-metrics created configmap/kube-state-metrics-config created serviceaccount/kube-state-metrics created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created role.rbac.authorization.k8s.io/kube-state-metrics-resizer created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created rolebinding.rbac.authorization.k8s.io/kube-state-metrics created service/kube-state-metrics created [root@master01 kube-state-metrics]#  

  验证:查看对应的pod和service是否都成功创建?

容器编排系统K8s之Prometheus监控系统+Grafana部署

  提示:可以看到对应pod和svc都已经正常创建;

  验证:访问对应service的8080端口,url为/metrics,看看是否能够访问到数据?

容器编排系统K8s之Prometheus监控系统+Grafana部署

  提示:可以看到访问对应service的8080端口,url为/metrics能够访问到对应数据,说明kube-state-metrics组件安装部署完成;

  2、部署node-exporter

  创建node-export service配置清单

[root@master01 node_exporter]# cat node-exporter-service.yaml  apiVersion: v1 kind: Service metadata:   name: node-exporter   namespace: kube-system   annotations:     prometheus.io/scrape: "true"   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile     kubernetes.io/name: "NodeExporter" spec:   clusterIP: None   ports:     - name: metrics       port: 9100       protocol: TCP       targetPort: 9100   selector:     k8s-app: node-exporter [root@master01 node_exporter]#  

  创建node-export 部署清单

[root@master01 node_exporter]# cat node-exporter-ds.yml  apiVersion: apps/v1 kind: DaemonSet metadata:   name: node-exporter   namespace: kube-system   labels:     k8s-app: node-exporter     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile     version: v1.0.1 spec:   selector:     matchLabels:       k8s-app: node-exporter       version: v1.0.1   updateStrategy:     type: OnDelete   template:     metadata:       labels:         k8s-app: node-exporter         version: v1.0.1     spec:       priorityClassName: system-node-critical       containers:         - name: prometheus-node-exporter           image: "prom/node-exporter:v1.0.1"           imagePullPolicy: "IfNotPresent"           args:             - --path.procfs=/host/proc             - --path.sysfs=/host/sys           ports:             - name: metrics               containerPort: 9100               hostPort: 9100           volumeMounts:             - name: proc               mountPath: /host/proc               readOnly:  true             - name: sys               mountPath: /host/sys               readOnly: true           resources:             limits:               memory: 50Mi             requests:               cpu: 100m               memory: 50Mi       hostNetwork: true       hostPID: true       volumes:         - name: proc           hostPath:             path: /proc         - name: sys           hostPath:             path: /sys       tolerations:       - key: node-role.kubernetes.io/master         operator: Exists         effect: NoSchedule        [root@master01 node_exporter]#  

  提示:上述清单主要用daemonSet控制器来运行node-exporter pod,并在对应pod上做了共享宿主机网络名称空间和pid,以及对主节点污点的容忍度;这样node-exporter就可以在k8s的所有节点上运行一个pod,通过对应pod来采集对应节点上的指标数据;

  应用上述两个配置清单部署 node-exporter

[root@master01 node_exporter]# ls node-exporter-ds.yml  node-exporter-service.yaml [root@master01 node_exporter]# kubectl apply -f . daemonset.apps/node-exporter created service/node-exporter created [root@master01 node_exporter]#  

  验证:查看对应pod和svc是否正常创建?

[root@master01 node_exporter]# kubectl get pods -l "k8s-app=node-exporter" -n kube-system NAME                  READY   STATUS    RESTARTS   AGE node-exporter-6zgkz   1/1     Running   0          107s node-exporter-9mvxr   1/1     Running   0          107s node-exporter-jbll7   1/1     Running   0          107s node-exporter-s7vvt   1/1     Running   0          107s node-exporter-xmrjh   1/1     Running   0          107s [root@master01 node_exporter]# kubectl get svc -n kube-system NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE kube-dns             ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   39d kube-state-metrics   ClusterIP   10.110.110.216   <none>        8080/TCP,8081/TCP        20m metrics-server       ClusterIP   10.98.59.116     <none>        443/TCP                  46h node-exporter        ClusterIP   None             <none>        9100/TCP                 116s [root@master01 node_exporter]#  

  验证:访问任意节点上的9100端口,url为/metrics,看看是否能够访问到指标数据?

容器编排系统K8s之Prometheus监控系统+Grafana部署

  提示:可以看到对应端口下/metrics url能够访问到对应的数据,说明node-exporter组件部署成功;

  3、部署alertmanager

  创建alertmanager pvc配置清单

[root@master01 alertmanager]# cat alertmanager-pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata:   name: alertmanager   namespace: kube-system   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: EnsureExists spec: #  storageClassName: standard   accessModes:     - ReadWriteOnce   resources:     requests:       storage: "2Gi" [root@master01 alertmanager]#  

  创建pv

[root@master01 ~]# cat pv-demo.yaml  apiVersion: v1 kind: PersistentVolume metadata:   name: nfs-pv-v1 spec:   capacity:     storage: 5Gi   volumeMode: Filesystem   accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"]   persistentVolumeReclaimPolicy: Retain   mountOptions:   - hard   - nfsvers=4.1   nfs:     path: /data/v1     server: 192.168.0.99 --- apiVersion: v1 kind: PersistentVolume metadata:   name: nfs-pv-v2 spec:   capacity:     storage: 5Gi   volumeMode: Filesystem   accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"]   persistentVolumeReclaimPolicy: Retain   mountOptions:   - hard   - nfsvers=4.1   nfs:     path: /data/v2     server: 192.168.0.99 ---  apiVersion: v1 kind: PersistentVolume metadata:   name: nfs-pv-v3 spec:   capacity:     storage: 5Gi   volumeMode: Filesystem   accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"]   persistentVolumeReclaimPolicy: Retain   mountOptions:   - hard   - nfsvers=4.1   nfs:     path: /data/v3     server: 192.168.0.99 [root@master01 ~]#  

  应用清单创建pv

[root@master01 ~]# kubectl apply -f pv-demo.yaml persistentvolume/nfs-pv-v1 created persistentvolume/nfs-pv-v2 created persistentvolume/nfs-pv-v3 created [root@master01 ~]# kubectl get pv NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE nfs-pv-v1   5Gi        RWO,ROX,RWX    Retain           Available                                   4s nfs-pv-v2   5Gi        RWO,ROX,RWX    Retain           Available                                   4s nfs-pv-v3   5Gi        RWO,ROX,RWX    Retain           Available                                   4s [root@master01 ~]#  

  创建alertmanager service配置清单

[root@master01 alertmanager]# cat alertmanager-service.yaml  apiVersion: v1 kind: Service metadata:   name: alertmanager   namespace: kube-system   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile     kubernetes.io/name: "Alertmanager" spec:   ports:     - name: http       port: 80       protocol: TCP       targetPort: 9093       nodePort: 30093   selector:     k8s-app: alertmanager   type: "NodePort" [root@master01 alertmanager]#  

  创建alertmanager cm配置清单

[root@master01 alertmanager]# cat alertmanager-configmap.yaml  apiVersion: v1 kind: ConfigMap metadata:   name: alertmanager-config   namespace: kube-system   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: EnsureExists data:   alertmanager.yml: |     global: null     receivers:     - name: default-receiver     route:       group_interval: 5m       group_wait: 10s       receiver: default-receiver       repeat_interval: 3h [root@master01 alertmanager]#  

  创建alertmanager 部署清单

[root@master01 alertmanager]# cat alertmanager-deployment.yaml  apiVersion: apps/v1 kind: Deployment metadata:   name: alertmanager   namespace: kube-system   labels:     k8s-app: alertmanager     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile     version: v0.14.0 spec:   replicas: 1   selector:     matchLabels:       k8s-app: alertmanager       version: v0.14.0   template:     metadata:       labels:         k8s-app: alertmanager         version: v0.14.0     spec:       priorityClassName: system-cluster-critical       containers:         - name: prometheus-alertmanager           image: "prom/alertmanager:v0.14.0"           imagePullPolicy: "IfNotPresent"           args:             - --config.file=/etc/config/alertmanager.yml             - --storage.path=/data             - --web.external-url=/           ports:             - containerPort: 9093           readinessProbe:             httpGet:               path: /#/status               port: 9093             initialDelaySeconds: 30             timeoutSeconds: 30           volumeMounts:             - name: config-volume               mountPath: /etc/config             - name: storage-volume               mountPath: "/data"               subPath: ""           resources:             limits:               cpu: 10m               memory: 50Mi             requests:               cpu: 10m               memory: 50Mi #        - name: prometheus-alertmanager-configmap-reload #          image: "jimmidyson/configmap-reload:v0.1" #          imagePullPolicy: "IfNotPresent" #          args: #            - --volume-dir=/etc/config #            - --webhook-url=http://localhost:9093/-/reload #          volumeMounts: #            - name: config-volume #              mountPath: /etc/config #              readOnly: true #          resources: #            limits: #              cpu: 10m #              memory: 10Mi #            requests: #              cpu: 10m #              memory: 10Mi       volumes:         - name: config-volume           configMap:             name: alertmanager-config         - name: storage-volume           persistentVolumeClaim:             claimName: alertmanager [root@master01 alertmanager]#  

  应用上述4个清单,部署alertmanager

[root@master01 alertmanager]# ls alertmanager-configmap.yaml  alertmanager-deployment.yaml  alertmanager-pvc.yaml  alertmanager-service.yaml [root@master01 alertmanager]# kubectl apply -f . configmap/alertmanager-config created deployment.apps/alertmanager created persistentvolumeclaim/alertmanager created service/alertmanager created [root@master01 alertmanager]#  

  验证:查看对应pod和svc是否正常创建?

[root@master01 alertmanager]# kubectl get pods -l "k8s-app=alertmanager" -n kube-system NAME                            READY   STATUS    RESTARTS   AGE alertmanager-6546bf7676-lt9jq   1/1     Running   0          85s [root@master01 alertmanager]# kubectl get svc -n kube-system NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE alertmanager         NodePort    10.99.246.148    <none>        80:30093/TCP             92s kube-dns             ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   39d kube-state-metrics   ClusterIP   10.110.110.216   <none>        8080/TCP,8081/TCP        31m metrics-server       ClusterIP   10.98.59.116     <none>        443/TCP                  47h node-exporter        ClusterIP   None             <none>        9100/TCP                 13m [root@master01 alertmanager]#  

  验证:访问任意节点的30093端口,看看是否能够访问到alertmanager?

容器编排系统K8s之Prometheus监控系统+Grafana部署

  提示:访问对应的端口能够访问到上述界面,说明alertmanager 部署成功;

  4、部署prometheus-server

   创建Prometheus rabc相关授权配置清单

[root@master01 prometheus-server]# cat prometheus-rbac.yaml  apiVersion: v1 kind: ServiceAccount metadata:   name: prometheus   namespace: kube-system   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata:   name: prometheus   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile rules:   - apiGroups:       - ""     resources:       - nodes       - nodes/metrics       - services       - endpoints       - pods     verbs:       - get       - list       - watch   - apiGroups:       - ""     resources:       - configmaps     verbs:       - get   - nonResourceURLs:       - "/metrics"     verbs:       - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata:   name: prometheus   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile roleRef:   apiGroup: rbac.authorization.k8s.io   kind: ClusterRole   name: prometheus subjects: - kind: ServiceAccount   name: prometheus   namespace: kube-system [root@master01 prometheus-server]#  

  创建Prometheus service配置清单

[root@master01 prometheus-server]# cat prometheus-service.yaml  kind: Service apiVersion: v1 metadata:   name: prometheus   namespace: kube-system   labels:     kubernetes.io/name: "Prometheus"     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile spec:   ports:     - name: http       port: 9090       protocol: TCP       targetPort: 9090       nodePort: 30090   selector:     k8s-app: prometheus   type: NodePort [root@master01 prometheus-server]#  

  创建Prometheus cm配置清单

[root@master01 prometheus-server]# cat prometheus-configmap.yaml  # Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/ apiVersion: v1 kind: ConfigMap metadata:   name: prometheus-config   namespace: kube-system   labels:     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: EnsureExists data:   prometheus.yml: |     scrape_configs:     - job_name: prometheus       static_configs:       - targets:         - localhost:9090      - job_name: kubernetes-apiservers       kubernetes_sd_configs:       - role: endpoints       relabel_configs:       - action: keep         regex: default;kubernetes;https         source_labels:         - __meta_kubernetes_namespace         - __meta_kubernetes_service_name         - __meta_kubernetes_endpoint_port_name       scheme: https       tls_config:         ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt         insecure_skip_verify: true       bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token      - job_name: kubernetes-nodes-kubelet       kubernetes_sd_configs:       - role: node       relabel_configs:       - action: labelmap         regex: __meta_kubernetes_node_label_(.+)       scheme: https       tls_config:         ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt         insecure_skip_verify: true       bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token      - job_name: kubernetes-nodes-cadvisor       kubernetes_sd_configs:       - role: node       relabel_configs:       - action: labelmap         regex: __meta_kubernetes_node_label_(.+)       - target_label: __metrics_path__         replacement: /metrics/cadvisor       scheme: https       tls_config:         ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt         insecure_skip_verify: true       bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token      - job_name: kubernetes-service-endpoints       kubernetes_sd_configs:       - role: endpoints       relabel_configs:       - action: keep         regex: true         source_labels:         - __meta_kubernetes_service_annotation_prometheus_io_scrape       - action: replace         regex: (https?)         source_labels:         - __meta_kubernetes_service_annotation_prometheus_io_scheme         target_label: __scheme__       - action: replace         regex: (.+)         source_labels:         - __meta_kubernetes_service_annotation_prometheus_io_path         target_label: __metrics_path__       - action: replace         regex: ([^:]+)(?::d+)?;(d+)         replacement: $1:$2         source_labels:         - __address__         - __meta_kubernetes_service_annotation_prometheus_io_port         target_label: __address__       - action: labelmap         regex: __meta_kubernetes_service_label_(.+)       - action: replace         source_labels:         - __meta_kubernetes_namespace         target_label: kubernetes_namespace       - action: replace         source_labels:         - __meta_kubernetes_service_name         target_label: kubernetes_name      - job_name: kubernetes-services       kubernetes_sd_configs:       - role: service       metrics_path: /probe       params:         module:         - http_2xx       relabel_configs:       - action: keep         regex: true         source_labels:         - __meta_kubernetes_service_annotation_prometheus_io_probe       - source_labels:         - __address__         target_label: __param_target       - replacement: blackbox         target_label: __address__       - source_labels:         - __param_target         target_label: instance       - action: labelmap         regex: __meta_kubernetes_service_label_(.+)       - source_labels:         - __meta_kubernetes_namespace         target_label: kubernetes_namespace       - source_labels:         - __meta_kubernetes_service_name         target_label: kubernetes_name      - job_name: kubernetes-pods       kubernetes_sd_configs:       - role: pod       relabel_configs:       - action: keep         regex: true         source_labels:         - __meta_kubernetes_pod_annotation_prometheus_io_scrape       - action: replace         regex: (.+)         source_labels:         - __meta_kubernetes_pod_annotation_prometheus_io_path         target_label: __metrics_path__       - action: replace         regex: ([^:]+)(?::d+)?;(d+)         replacement: $1:$2         source_labels:         - __address__         - __meta_kubernetes_pod_annotation_prometheus_io_port         target_label: __address__       - action: labelmap         regex: __meta_kubernetes_pod_label_(.+)       - action: replace         source_labels:         - __meta_kubernetes_namespace         target_label: kubernetes_namespace       - action: replace         source_labels:         - __meta_kubernetes_pod_name         target_label: kubernetes_pod_name     alerting:       alertmanagers:       - kubernetes_sd_configs:           - role: pod         tls_config:           ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt         bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token         relabel_configs:         - source_labels: [__meta_kubernetes_namespace]           regex: kube-system           action: keep         - source_labels: [__meta_kubernetes_pod_label_k8s_app]           regex: alertmanager           action: keep         - source_labels: [__meta_kubernetes_pod_container_port_number]           regex:           action: drop [root@master01 prometheus-server]#  

  创建Prometheus 部署清单

[root@master01 prometheus-server]# cat prometheus-statefulset.yaml  apiVersion: apps/v1 kind: StatefulSet metadata:   name: prometheus   namespace: kube-system   labels:     k8s-app: prometheus     kubernetes.io/cluster-service: "true"     addonmanager.kubernetes.io/mode: Reconcile     version: v2.24.0 spec:   serviceName: "prometheus"   replicas: 1   podManagementPolicy: "Parallel"   updateStrategy:    type: "RollingUpdate"   selector:     matchLabels:       k8s-app: prometheus   template:     metadata:       labels:         k8s-app: prometheus     spec:       priorityClassName: system-cluster-critical       serviceAccountName: prometheus       initContainers:       - name: "init-chown-data"         image: "busybox:latest"         imagePullPolicy: "IfNotPresent"         command: ["chown", "-R", "65534:65534", "/data"]         volumeMounts:         - name: prometheus-data           mountPath: /data           subPath: ""       containers: #        - name: prometheus-server-configmap-reload #          image: "jimmidyson/configmap-reload:v0.1" #          imagePullPolicy: "IfNotPresent" #          args: #            - --volume-dir=/etc/config #            - --webhook-url=http://localhost:9090/-/reload #          volumeMounts: #            - name: config-volume #              mountPath: /etc/config #              readOnly: true #          resources: #            limits: #              cpu: 10m #              memory: 10Mi #            requests: #              cpu: 10m #              memory: 10Mi          - name: prometheus-server           image: "prom/prometheus:v2.24.0"           imagePullPolicy: "IfNotPresent"           args:             - --config.file=/etc/config/prometheus.yml             - --storage.tsdb.path=/data             - --web.console.libraries=/etc/prometheus/console_libraries             - --web.console.templates=/etc/prometheus/consoles             - --web.enable-lifecycle           ports:             - containerPort: 9090           readinessProbe:             httpGet:               path: /-/ready               port: 9090             initialDelaySeconds: 30             timeoutSeconds: 30           livenessProbe:             httpGet:               path: /-/healthy               port: 9090             initialDelaySeconds: 30             timeoutSeconds: 30           # based on 10 running nodes with 30 pods each           resources:             limits:               cpu: 200m               memory: 1000Mi             requests:               cpu: 200m               memory: 1000Mi            volumeMounts:             - name: config-volume               mountPath: /etc/config             - name: prometheus-data               mountPath: /data               subPath: ""       terminationGracePeriodSeconds: 300       volumes:         - name: config-volume           configMap:             name: prometheus-config   volumeClaimTemplates:   - metadata:       name: prometheus-data     spec: #      storageClassName: standard       accessModes:         - ReadWriteOnce       resources:         requests:           storage: "5Gi" [root@master01 prometheus-server]#  

  提示:应用上述清单前,请确保对应pv容量是否够用;

  应用上述4个清单部署Prometheus server

[root@master01 prometheus-server]# ls prometheus-configmap.yaml  prometheus-rbac.yaml  prometheus-service.yaml  prometheus-statefulset.yaml [root@master01 prometheus-server]# kubectl apply -f . configmap/prometheus-config created serviceaccount/prometheus created clusterrole.rbac.authorization.k8s.io/prometheus created clusterrolebinding.rbac.authorization.k8s.io/prometheus created service/prometheus created statefulset.apps/prometheus created [root@master01 prometheus-server]#  

  验证:查看对应pod和svc是否成功创建?

[root@master01 prometheus-server]# kubectl get pods -l "k8s-app=prometheus" -n kube-system NAME           READY   STATUS    RESTARTS   AGE prometheus-0   1/1     Running   0          2m20s [root@master01 prometheus-server]# kubectl get svc -n kube-system NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE alertmanager         NodePort    10.99.246.148    <none>        80:30093/TCP             10m kube-dns             ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   39d kube-state-metrics   ClusterIP   10.110.110.216   <none>        8080/TCP,8081/TCP        40m metrics-server       ClusterIP   10.98.59.116     <none>        443/TCP                  47h node-exporter        ClusterIP   None             <none>        9100/TCP                 22m prometheus           NodePort    10.111.155.1     <none>        9090:30090/TCP           2m27s [root@master01 prometheus-server]#  

  验证:访问任意节点的30090端口,看看对应Prometheus 是否能够被访问?

容器编排系统K8s之Prometheus监控系统+Grafana部署

  提示:能够访问到上述页面,表示Prometheus server部署没有问题;

  通过上述界面查看监控指标数据

容器编排系统K8s之Prometheus监控系统+Grafana部署

  提示:选择对应要查看的指标数据项,点击execute,对应图像就会呈现出来;到此Prometheus监控系统就部署完成了,接下来部署grafana,并配置grafana使用Prometheus数据源展示监控数据;

  部署grafana

  创建grafana 部署清单

[root@master01 grafana]# cat grafana.yaml apiVersion: apps/v1 kind: Deployment metadata:   name: monitoring-grafana   namespace: kube-system spec:   replicas: 1   selector:     matchLabels:       task: monitoring       k8s-app: grafana   template:     metadata:       labels:         task: monitoring         k8s-app: grafana     spec:       containers:       - name: grafana         image: k8s.gcr.io/heapster-grafana-amd64:v5.0.4         ports:         - containerPort: 3000           protocol: TCP         volumeMounts:         - mountPath: /etc/ssl/certs           name: ca-certificates           readOnly: true         - mountPath: /var           name: grafana-storage         env: #        - name: INFLUXDB_HOST #          value: monitoring-influxdb         - name: GF_SERVER_HTTP_PORT           value: "3000"         - name: GF_AUTH_BASIC_ENABLED           value: "false"         - name: GF_AUTH_ANONYMOUS_ENABLED           value: "true"         - name: GF_AUTH_ANONYMOUS_ORG_ROLE           value: Admin         - name: GF_SERVER_ROOT_URL           value: /       volumes:       - name: ca-certificates         hostPath:           path: /etc/ssl/certs       - name: grafana-storage         emptyDir: {} --- apiVersion: v1 kind: Service metadata:   labels:     kubernetes.io/cluster-service: 'true'     kubernetes.io/name: monitoring-grafana   name: monitoring-grafana   namespace: kube-system spec:   ports:   - port: 80     targetPort: 3000   selector:     k8s-app: grafana   type: "NodePort" [root@master01 grafana]#  

  应用资源清单 部署grafana

[root@master01 grafana]# ls grafana.yaml [root@master01 grafana]# kubectl apply -f . deployment.apps/monitoring-grafana created service/monitoring-grafana created [root@master01 grafana]#  

  验证:查看对应pod和svc是否都创建?

[root@master01 grafana]# kubectl get pods -l "k8s-app=grafana" -n kube-system NAME                                  READY   STATUS    RESTARTS   AGE monitoring-grafana-6c74ccc5dd-grjzf   1/1     Running   0          87s [root@master01 grafana]# kubectl get svc -n kube-system NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE alertmanager         NodePort    10.99.246.148    <none>        80:30093/TCP             82m kube-dns             ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   39d kube-state-metrics   ClusterIP   10.110.110.216   <none>        8080/TCP,8081/TCP        112m metrics-server       ClusterIP   10.98.59.116     <none>        443/TCP                  2d monitoring-grafana   NodePort    10.100.230.71    <none>        80:30196/TCP             92s node-exporter        ClusterIP   None             <none>        9100/TCP                 94m prometheus           NodePort    10.111.155.1     <none>        9090:30090/TCP           74m [root@master01 grafana]#  

  提示:可以看到grafana svc暴露了30196端口;

  验证:访问grafana service 暴露的端口,看看对应pod是否能够被访问?

容器编排系统K8s之Prometheus监控系统+Grafana部署

  提示:能够访问到上述页面,表示grafana部署成功;

  配置grafana

  1、配置grafana的数据源为Prometheus

容器编排系统K8s之Prometheus监控系统+Grafana部署

  2、新建监控面板

容器编排系统K8s之Prometheus监控系统+Grafana部署

容器编排系统K8s之Prometheus监控系统+Grafana部署

容器编排系统K8s之Prometheus监控系统+Grafana部署

  提示:进入grafana.com网站上,下载监控面板模板;

  下载好模板文件以后,导入模板文件到grafana

容器编排系统K8s之Prometheus监控系统+Grafana部署

容器编排系统K8s之Prometheus监控系统+Grafana部署

容器编排系统K8s之Prometheus监控系统+Grafana部署

容器编排系统K8s之Prometheus监控系统+Grafana部署

容器编排系统K8s之Prometheus监控系统+Grafana部署

  提示:选择下载的模板文件,然后再选择对应的数据源,点击import即可;上面没有数据的原因是对应指标名称和Prometheus中指标名称不同导致的;我们可以根据自己环境Prometheus中指标数据名称来修改模板文件;