- A+
所属分类:linux技术
kube-promethues配置钉钉告警
前置:k8s部署kube-promethues
一.配置钉钉机器人
-
打开钉钉的智能群助手,点击添加机器人
-
选择自定义机器人
-
勾选加签,复制后保存
-
复制webhook地址后点击保存
二.编写dingtalk的yaml部署文件
vi dingtalk.yaml
apiVersion: v1 kind: Service metadata: name: dingtalk namespace: monitoring spec: selector: app: dingtalk ports: - name: http protocol: TCP port: 8060 targetPort: 8060 --- apiVersion: apps/v1 kind: Deployment metadata: name: dingtalk namespace: monitoring labels: app: dingtalk spec: replicas: 1 strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate selector: matchLabels: app: dingtalk template: metadata: labels: app: dingtalk spec: restartPolicy: "Always" containers: - name: dingtalk image: timonwong/prometheus-webhook-dingtalk:v2.1.0 imagePullPolicy: "IfNotPresent" volumeMounts: - name: dingtalk-conf mountPath: /etc/prometheus-webhook-dingtalk/ resources: limits: cpu: "400m" memory: "500Mi" requests: cpu: "100m" memory: "100Mi" ports: - containerPort: 8060 name: http protocol: TCP readinessProbe: failureThreshold: 3 periodSeconds: 5 initialDelaySeconds: 30 successThreshold: 1 tcpSocket: port: 8060 livenessProbe: tcpSocket: port: 8060 initialDelaySeconds: 30 periodSeconds: 10 volumes: - name: dingtalk-conf configMap: name: dingtalk-cm
prometheus-webhook-dingtalk是一个开源的钉钉告警的插件,目前最新版停留于v2.1.0
三.编写钉钉告警模板dingtalk-configmap.yaml
vi dingtalk-configmap.yaml
apiVersion: v1 kind: ConfigMap metadata: name: dingtalk-cm namespace: monitoring data: config.yml: |- templates: - /etc/prometheus-webhook-dingtalk/dingding.tmpl targets: webhook: url: https://oapi.dingtalk.com/robot/send?access_token=<复制的webhook地址> secret: "<加签的时候复制的secret>" message: text: '{{ template "dingtalk.to.message" . }}' dingding.tmpl: |- {{ define "dingtalk.to.message" }} {{- if gt (len .Alerts.Firing) 0 -}} {{- range $index, $alert := .Alerts -}} ========= **监控告警** ========= **告警集群:** k8s **告警类型:** {{ $alert.Labels.alertname }} **告警级别:** {{ $alert.Labels.severity }} **告警状态:** {{ .Status }} **故障主机:** {{ $alert.Labels.instance }} {{ $alert.Labels.device }} **告警主题:** {{ .Annotations.summary }} **告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}} **主机标签:** {{ range .Labels.SortedPairs }} </br> [{{ .Name }}: {{ .Value | markdown | html }} ] {{- end }} </br> **故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} ========= = **end** = ========= {{- end }} {{- end }} {{- if gt (len .Alerts.Resolved) 0 -}} {{- range $index, $alert := .Alerts -}} ========= **故障恢复** ========= **告警集群:** k8s **告警主题:** {{ $alert.Annotations.summary }} **告警主机:** {{ .Labels.instance }} **告警类型:** {{ .Labels.alertname }} **告警级别:** {{ $alert.Labels.severity }} **告警状态:** {{ .Status }} **告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}} **故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} **恢复时间:** {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} ========= = **end** = ========= {{- end }} {{- end }} {{- end }}
四.编写文件alertmanager-secret.yaml
该文件是 用来顶替原本kube-promethues部署时的,alertmanager的配置文件
vi alertmanager-secret.yaml
apiVersion: v1 data: { } kind: Secret metadata: name: alertmanager-main namespace: monitoring stringData: alertmanager.yaml: |- global: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 30m receiver: 'webhook' routes: - match: severity: 'info' continue: true receiver: 'null' - match: severity: 'none' continue: true receiver: 'null' receivers: - name: 'null' - name: 'webhook' webhook_configs: - send_resolved: true url: 'http://dingtalk:8060/dingtalk/webhook/send'
五.部署并检查是否运行成功
kubectl apply -f alertmanager-secret.yaml kubectl apply -f dingtalk-configmap.yaml kubectl apply -f dingtalk.yaml #查看是否部署成功 kubectl get pods -n monitoring | grep dingtalk
dingtalk部署成功后,重新部署alertmanager就行了。