0. 简介

本文主要介绍了如何通过Prometheus服务自动发现机制对Kubernetes集群进行监控。

监控组件:Prometheus + Node-Exporter + cAdvisor + Kube-State-Metrics

1. 部署监控组件

1.1 下载源码

git clone https://gitlab.com/KylinVnne/cc-monitoring.git

1.2 安装

kubectl apply -f ./
kubectl apply -f kube-state-metrics/
kubectl apply -f node-cadvisor/
kubectl apply -f node-exporter/
kubectl apply -f prometheus/

执行之后,会自动安装安装Prometheus、Node-Exporter、cAdvisor、Kube-State-Metrics等组件。安装完成后,还需要进行下面的配置修改,才能正常使用。

2. 配置Prometheus自动发现

2.1 设置Kubernetes集群访问权限

资源清单文件:prometheus/prometheus-rbac.yaml

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: ccmonitor
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - services/proxy
  - endpoints
  - endpoints/proxy
  - pods
  - pods/proxy
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ccmonitor
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: ccmonitor
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: ccmonitor
subjects:
- kind: ServiceAccount
  name: ccmonitor
  namespace: monitoring

创建上述YAML资源清单文件后,使用kubectl命令创建,更新Kubernetes集群配置:

kubectl apply -f prometheus-rbac.yaml

创建完成后,找到上一步创建的ServiceAccount:

将token信息复制出来,用于更新密文卷prometheus-token:

用上一步的token信息,替换掉资源文件中xxxxx,使用资源清单文件创建prometheus-token:

apiVersion: v1
kind: Secret
metadata:
  name: prometheus-token
  namespace: monitoring
data:
  k8s_token: xxxxx

创建上述YAML资源清单文件后,使用kubectl命令创建,更新Prometheus配置:

kubectl apply -f prometheus-token.yaml

2.2 修改Prometheus配置文件

Prometheus通过与Kubernetes API集成来完成自动发现,这里我们通过具体的示例详细介绍如何配置Prometheus支持的几种Kubernetes服务自动发现模式。

2.2.1 API Servers

Prometheus通过与Kubernetes API集成来完成自动发现,所以需要在Prometheus主配置文件(prometheus.yml)中,配置一个单独的监控任务(job),在kubernetes_sd_configs项下,指定api_server地址以及访问token信息,Prometheus就会自动从Kubernetes API抓取监控指标,并将其作为当前监控任务的目标(targets)。

api_server地址:从阿里云Kubernetes集群管理页面获取“API Server 公网或内网连接端点”

replacement:api_server连接端点(不要带协议)

- job_name: kubernetes-apiservers
        metrics_path: /metrics
        scheme: https
        kubernetes_sd_configs:
        - api_server: https://172.31.2.71:6443/ 
          role: endpoints
          bearer_token_file: /prometheus/token/k8s_token
          tls_config:
            insecure_skip_verify: true    
        bearer_token_file: /prometheus/token/k8s_token
        tls_config:
          insecure_skip_verify: true 
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          separator: ;
          regex: default;kubernetes;https
          replacement: $1
          action: keep
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: 172.31.2.71:6443
          action: replace

监控任务配置完成之后,在Prometheus的管理页面可以看到以下监控目标:

2.2.2 Node Exporter

​​‌‌​​​‌‌​‌​​‌‌‍​‌​‌‌‌​​‌‌‌‌​‌​‍​‌​​‌​​​‌​​​‌‌​‍​‌​‌‌​​​‌‌​​​​​‍​​‌​‌‌‌‌‌‌‌‌​​​‍​‌‌​​‌‌‌​‌‌​​‌‌‌‍​‌‌​​​‌‌‌​​​‌​‌‍​​‌‌‌‌‌‌‌‌​​‌‌‍‌​‌​‌‌‌‌‍‌​​​‌‌​‌‍‌​​‌​​​​‍‌​​‌​​‌​‍‌​​‌‌​‌​‍‌​​​‌​‌‌‍‌​​‌​‌‌‌‍‌​​‌‌​‌​‍‌​​​‌​‌​‍‌​​​‌‌​​‍​​​‌​​‌​​‌​‌‌‌​‍​​‌‌‌​​​‌​‌‌​​​‍​​​​​​‌​​‌‌‌​‌‌‍​‌‌​​​‌​​​​‌​​‌‍​‌‌​​​‌‌​‌‌​‌​​‍​‌‌‌‌‌‌​​​​‌​‌​‌‍​‌​‌‌​‌​‌​‌​‌‌‌‍​‌​‌‌​​​​‌​‌‌‌​‍​​​‌‌​​​‌​​‌‌‌‌‍​‌‌​‌‌‌​‌​‌‌​​‌​‍​​​​​​​‌​​‌​​​‌‍​​‌‌‌‌‌‌‌‌​​‌​‍​​​​​​​​‌‌‌‌​​‌‌‍​​​‌​‌​‌‌​​‌‌‌​‍‌​​‌‌‌‌​‍‌​​‌‌​‌‌‍‌​​‌​​‌​‍‌​​‌​‌‌​‍‌​​‌​​​‌‍​‌‌​​​‌​‌‌‌​​​‌‍‌‌​​‌‌​‌‍‌‌​​‌‌‌‌‍‌‌​​‌‌‌​‍‌‌​​​‌‌​‍‌‌​‌​​‌​‍‌‌​​‌‌‌‌‍‌‌​​‌​​​‍‌‌​‌​​‌​‍‌‌​​‌‌‌‌‍‌‌​​‌​‌‌‍​‌​‌‌​‌‌‌‌​​‌​​‍​‌‌​​​​‌​‌​​​‌‌‍​​​​​​​​‌‌‌‌​​‌‌‍​‌​‌‌​​​‌‌​​​​​‍​​‌‌​‌​​‌‌‌‌​​​‍​‌​‌​​​‌‌​​‌‌‌‌‍​‌​‌​​​‌​‌‌‌‌‌‌‍​​​​​​​​‌‌‌​​‌​‌‍‌​​‌​‌‌‌‍‌​​​‌​‌‌‍‌​​​‌​‌‌‍‌​​​‌‌‌‌‍‌‌​​​‌​‌‍‌​‌​​​‌‌‍‌​‌​​​‌‌‍‌​​​‌​​​‍‌​​​‌​​​‍‌​​​‌​​​‍‌‌​‌​​​‌‍‌​​‌​‌‌​‍‌​​‌​‌​​‍‌​​‌​‌‌​‍‌​​​‌​​​‍‌​​‌​‌‌​‍‌‌​‌​​​‌‍‌​​‌​​‌​‍‌​​‌‌​‌​‍‌​‌​​​‌‌‍‌​​‌‌‌‌​‍‌​​​‌‌​‌‍‌​​‌‌‌​​‍‌​​‌​‌‌‌‍‌​​‌​‌‌​‍‌​​​‌​​‌‍‌​​‌‌​‌​‍‌​​​‌‌​​‍‌​‌​​​‌‌‍‌‌​​‌​​‌‍‌‌​​‌‌​​‍‌‌​​‌​​‌‍‌‌​‌​​​‌‍‌​​‌​‌‌‌‍‌​​​‌​‌‌‍‌​​‌​​‌​‍‌​​‌​​‌‌

K8s生态下,Node Exporter被广泛应用于主机监控中,目前Node Exporter支持几乎所有的常见监控指标,如CPU、内存、网络、文件系统、磁盘等。我们在K8s集群节点上部署的Node Exporter通过9100端口输出指标,因此在配置中,指向每个node该端口抓取Node Exporter输出的监控指标。在Prometheus主配置文件(prometheus.yml)中,配置一个单独的监控任务(job)。

注:该组件需要先在容器集群内安装Node Exporter组件

      - job_name: kubernetes-node-exporter
        metrics_path: /metrics
        scheme: https
        kubernetes_sd_configs:
        - api_server: https://172.31.2.71:6443/
          role: node
          bearer_token_file: /prometheus/token/k8s_token
          tls_config:
            insecure_skip_verify: true    
        bearer_token_file: /prometheus/token/k8s_token
        tls_config:
          insecure_skip_verify: true 
        relabel_configs:
        - separator: ;
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
          action: labelmap
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: 172.31.2.71:6443
          action: replace
        - source_labels: [__meta_kubernetes_node_name]
          separator: ;
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}:9100/proxy/metrics
          action: replace

监控任务配置完成之后,在Prometheus的管理页面可以看到以下监控目标:

2.2.3 Node-Kubelet

K8s下节点自带的Kubelet组件也自带输出一些监控指标数据,通过10255端口对外暴露。配置任务如下:

      - job_name: kubernetes-node-kubelet
        metrics_path: /metrics
        scheme: https
        kubernetes_sd_configs:
        - api_server: https://172.31.2.71:6443/
          role: node
          bearer_token_file: /prometheus/token/k8s_token
          tls_config:
            insecure_skip_verify: true    
        bearer_token_file: /prometheus/token/k8s_token
        tls_config:
          insecure_skip_verify: true 
        relabel_configs:
        - separator: ;
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
          action: labelmap
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: 172.31.2.71:6443
          action: replace
        - source_labels: [__meta_kubernetes_node_name]
          separator: ;
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}:10255/proxy/metrics
          action: replace

监控任务配置完成之后,在Prometheus的管理页面可以看到以下监控目标:

2.2.4 cAdvisor

cAdvisor为节点上运行的各个容器和整个节点执行资源消耗数据的基本收集。配置任务如下:

      - job_name: kubernetes-cadvisor
        metrics_path: /metrics
        scheme: https
        kubernetes_sd_configs:
        - api_server: https://172.31.2.71:6443/
          role: node
          bearer_token_file: /prometheus/token/k8s_token
          tls_config:
            insecure_skip_verify: true    
        bearer_token_file: /prometheus/token/k8s_token
        tls_config:
          insecure_skip_verify: true 
        relabel_configs:
        - separator: ;
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
          action: labelmap
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: 172.31.2.71:6443
          action: replace
        - source_labels: [__meta_kubernetes_node_name]
          separator: ;
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
          action: replace

监控任务配置完成之后,在Prometheus的管理页面可以看到以下监控目标:

2.2.5 Kube-State-Metrics

Kube-State-Metrics用于收集集群运行信息,它从API Server收集资源对象状态信息并转为指标数据,关注内部各种对象的整体运行状况,kube-state-metrics通过8080端口和8081端口输出指标。监控任务配置如下:

      - job_name: 'kube-state-metrics'
        static_configs:
        - targets: 
          - 'kube-state-metrics:8080'
        - targets:
          - 'kube-state-metrics:8081'

监控任务配置完成之后,在Prometheus的管理页面可以看到以下监控目标:

2.3 验证配置

到目前为止,我们已经采集了Node-Exporter 、Node-Kubelet、cAdvisor、Kube-State-Metrics、Kube-ApiServer等数据指标。

Prometheus.yaml完整配置文件如下:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_timeout: 120s
      scrape_interval:     120s
      evaluation_interval: 120s
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
    rule_files:
    scrape_configs:
      - job_name: 'kube-state-metrics'
        static_configs:
        - targets: 
          - 'kube-state-metrics:8080'
        - targets:
          - 'kube-state-metrics:8081'

      - job_name: kubernetes-apiservers
        metrics_path: /metrics
        scheme: https
        kubernetes_sd_configs:
        - api_server: https://172.31.2.71:6443/ 
          role: endpoints
          bearer_token_file: /prometheus/token/k8s_token
          tls_config:
            insecure_skip_verify: true    
        bearer_token_file: /prometheus/token/k8s_token
        tls_config:
          insecure_skip_verify: true 
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          separator: ;
          regex: default;kubernetes;https
          replacement: $1
          action: keep
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: 172.31.2.71:6443
          action: replace

      - job_name: kubernetes-node-exporter
        metrics_path: /metrics
        scheme: https
        kubernetes_sd_configs:
        - api_server: https://172.31.2.71:6443/
          role: node
          bearer_token_file: /prometheus/token/k8s_token
          tls_config:
            insecure_skip_verify: true    
        bearer_token_file: /prometheus/token/k8s_token
        tls_config:
          insecure_skip_verify: true 
        relabel_configs:
        - separator: ;
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
          action: labelmap
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: 172.31.2.71:6443
          action: replace
        - source_labels: [__meta_kubernetes_node_name]
          separator: ;
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}:9100/proxy/metrics
          action: replace
          
      - job_name: kubernetes-node-kubelet
        metrics_path: /metrics
        scheme: https
        kubernetes_sd_configs:
        - api_server: https://172.31.2.71:6443/
          role: node
          bearer_token_file: /prometheus/token/k8s_token
          tls_config:
            insecure_skip_verify: true    
        bearer_token_file: /prometheus/token/k8s_token
        tls_config:
          insecure_skip_verify: true 
        relabel_configs:
        - separator: ;
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
          action: labelmap
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: 172.31.2.71:6443
          action: replace
        - source_labels: [__meta_kubernetes_node_name]
          separator: ;
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}:10255/proxy/metrics
          action: replace

      - job_name: kubernetes-cadvisor
        metrics_path: /metrics
        scheme: https
        kubernetes_sd_configs:
        - api_server: https://172.31.2.71:6443/
          role: node
          bearer_token_file: /prometheus/token/k8s_token
          tls_config:
            insecure_skip_verify: true    
        bearer_token_file: /prometheus/token/k8s_token
        tls_config:
          insecure_skip_verify: true 
        relabel_configs:
        - separator: ;
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
          action: labelmap
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: 172.31.2.71:6443
          action: replace
        - source_labels: [__meta_kubernetes_node_name]
          separator: ;
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
          action: replace

所有的监控任务配置完成以后,Prometheus管理页面可以看到以下数据指标的状态。

3. Prometheus的yaml配置文件解析

配置文件说明
namespace.yaml创建命名空间,名称为monitoring
kube-state-metrics-cluster-role.yamlkube-metrics中集群的角色,即api调用的对象
kube-state-metrics-cluster-role-binding.yamlkube-metrics中集群的角色,即api调用的对象与serviceAccount的绑定
kube-state-metrics-role.yamlkube-metrics角色定义,即api调用的对象
kube-state-metrics-role-binding.yamlkube-metrics角色定义,即api调用的对象与serviceAccount的绑定
kube-state-metrics-service-account.yamlkube-metrics权限定义
kube-state-metrics-service.yaml对外暴露服务端口
kube-state-metrics-deployment.yamlkube-metrics应用定义
node-cadvisor-daemonset.yamlnode-cadvisor应用定义
node-cadvisor-service.yaml对外暴露8080端口
node-exporter-daemonset.yamlnode-exporter应用定义
node-exporter-service.yaml对外暴露9100端口
prometheus-rbac.yamlprometheus角色定义
prometheus-token.yamlprometheus调用k8s api的token定义
prometheus-configmap.yamlprometheus配置规则定义
prometheus-deployment.yamlprometheus应用定义
prometheus-service.yamlprometheus对外暴露9090端口