node_exporter默认是只能监控挂载的硬盘的总大小,如果想知道具体某个目录的大小,并不能简单通过配置实现。 查找了一下文档,只能通过脚本将磁盘大小输出到.prom文件中,配置node_exporter从文件中获取指标方式来实现该需求,因为都跑k8s了,所以我打算通过daemonset来运行一个pod,实现上述工作。
准备镜像 这里使用alpine作为基础镜像,安装了几个常用的工具,其中只有moreutils是必须的,不然没有后续定时任务中的sponge命令,不想装的话将定时任务中sponge改成mv也行, 然后修改了下时区以及启动crond,都是为了定时任务做准备。
1 2 3 4 5 6 7 FROM alpine:3.18 .4 RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apk/repositories RUN apk add --no-cache bash openssh curl mysql-client moreutils RUN apk add tzdata \ && cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \ && echo "Asia/Shanghai" > /etc/timezone CMD crond -l 2 -f```
这里推的都是我公司的私有仓库,并未分享出来,大家不要尝试直接拉这个镜像哦,备忘一下docker build命令而已,老忘。。。
1 2 docker build -t docker.shuyilink.com/operations:1.0 . docker push docker.shuyilink.com/operations:1.0
准备生成指标脚本 根据参考链接中官方的脚本进行修改,进行如下几个点修改:
增加du命令超时时间,防止目录过大du命令占用过多io,超时后会将改目录大小设置为999999999,也便于后面监控报警进行识别
增加判断如果目录不存在,会跳过
去掉了du –block-size=1 –summarize 这两个参数,因为alphine镜像中的du没有。。。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 # !/bin/bash echo "# HELP node_directory_size_bytes Disk space used by some directories" echo "# TYPE node_directory_size_bytes gauge" # 设置超时时间为20秒 for dir in "$@" do if [ -d "$dir" ]; then output=$(timeout 20 du -s "$dir" 2> /dev/null) # 检查是否超时或者du命令未返回结果 if [ -z "$output" ]; then echo "node_directory_size_bytes{directory=\"$dir\"} 999999999" else echo "$output" | awk -v dir="$dir" '{ printf "node_directory_size_bytes{directory=\"%s\"} %s\n", dir, $1 }' fi fi done
准备k8s yaml 这里挂载了/data/log和/data/volume两个需要统计大小的目录进容器中,然后通过配置文件方式挂载了脚本和定时任务配置,然后通过定时任务方式将结果输出到了data/promfile/directory_size.prom文件中
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 apiVersion: apps/v1 kind: DaemonSet metadata: name: sygloperator namespace: ops spec: selector: matchLabels: app: sygloperator template: metadata: labels: app: sygloperator spec: containers: - name: sygloperator image: docker.shuyilink.com/operations:1.0 resources: limits: cpu: 100m memory: 128Mi requests: cpu: 10m memory: 128Mi volumeMounts: - mountPath: /data/log name: datalog - mountPath: /etc/crontabs/root name: cronjob subPath: root - mountPath: /root/scripts name: scripts - mountPath: /data/volume name: datavolume - mountPath: /data/promfile name: promfile nodeSelector: beta.kubernetes.io/arch: amd64 volumes: - hostPath: path: /data/log type: DirectoryOrCreate name: datalog - configMap: defaultMode: 420 name: cronjob-config name: cronjob - configMap: defaultMode: 420 name: opscripts name: scripts - hostPath: path: /data/volume type: DirectoryOrCreate name: datavolume - hostPath: path: /data/promfile type: DirectoryOrCreate name: promfile --- apiVersion: v1 data: root: | 5 */1 * * * /bin/bash /root/scripts/directory-size.sh /data/log /data/volume| sponge /data/promfile/directory_size.prom kind: ConfigMap metadata: name: cronjob-config namespace: ops --- apiVersion: v1 data: directory-size.sh: |- #!/bin/bash echo "# HELP node_directory_size_bytes Disk space used by some directories" echo "# TYPE node_directory_size_bytes gauge" for dir in "$@" do if [ -d "$dir" ]; then output=$(timeout 20 du -s "$dir" 2 > /dev/null) if [ -z "$output" ]; then echo "node_directory_size_bytes{directory=\"$dir\"} 999999999" else echo "$output" | awk -v dir="$dir" '{ printf "node_directory_size_bytes{directory=\"%s\"} %s\n", dir, $1 }' fi fi done kind: ConfigMap metadata: name: opscripts namespace: ops
如果正常运行的话,会在服务器/data/promfile/目录中生成directory_size.prom文件,文件内容如下
1 2 3 4 # HELP node_directory_size_bytes Disk space used by some directories # TYPE node_directory_size_bytes gauge node_directory_size_bytes{directory="/data/log"} 52609040 node_directory_size_bytes{directory="/data/volume"} 999999999
修改node_exporter配置 记得修改node_expoter增加配置–collector.textfile.directory=/rootfs/data/promfile 这里的目录根据你的的node_exporter挂载目录进行修改,我的完整node_exporter yaml如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 --- apiVersion: apps/v1 kind: DaemonSet metadata: annotations: deprecated.daemonset.template.generation: '8' labels: k8s-app: node-exporter k8s.kuboard.cn/name: node-exporter name: node-exporter namespace: ops resourceVersion: '612487219' spec: revisionHistoryLimit: 10 selector: matchLabels: k8s-app: node-exporter version: v1.0.1 template: metadata: creationTimestamp: null labels: k8s-app: node-exporter version: v1.0.1 spec: containers: - args: - '--path.procfs=/host/proc' - '--path.sysfs=/host/sys' - '--path.rootfs=/rootfs' - >- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|boot|run|mnt)($|/) - '--no-collector.zfs' - '--no-collector.netclass' - '--no-collector.nfs' - '--no-collector.filesystem' - '--log.level=debug' - '--collector.textfile.directory=/rootfs/data/promfile' image: 'prom/node-exporter:v1.0.1' imagePullPolicy: IfNotPresent name: prometheus-node-exporter ports: - containerPort: 9100 hostPort: 9100 name: metrics protocol: TCP resources: limits: cpu: 100m memory: 128Mi requests: cpu: 10m memory: 128Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /host/proc name: proc readOnly: true - mountPath: /host/sys name: sys readOnly: true - mountPath: /rootfs name: rootfs readOnly: true dnsPolicy: ClusterFirst hostIPC: true hostNetwork: true hostPID: true nodeSelector: kubernetes.io/arch: amd64 restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists volumes: - hostPath: path: /proc type: '' name: proc - hostPath: path: /sys type: '' name: sys - hostPath: path: / type: '' name: rootfs - hostPath: path: /dev type: '' name: dev updateStrategy: rollingUpdate: maxUnavailable: 1 type: RollingUpdate --- apiVersion: v1 kind: Service metadata: annotations: prometheus.io/scrape: 'true' name: node-exporter namespace: ops spec: clusterIP: None ports: - name: metrics port: 9100 protocol: TCP targetPort: 9100 selector: k8s-app: node-exporter sessionAffinity: None type: ClusterIP status: loadBalancer: {}
验证 curl nodeexporter的9100/metrics,应该就能看到如下内容,或者在prometheus中查询 node_directory_size_bytes指标,也会有对应的输出,如果没有,可以看看alpine pod的输出日志和node_exporter中的日志
1 2 3 4 # HELP node_directory_size_bytes Disk space used by some directories # TYPE node_directory_size_bytes gauge node_directory_size_bytes{directory="/data/log"} 4.231016e+06 node_directory_size_bytes{directory="/data/volume"} 174324
参考
node-exporter-textfile-collector-scripts/directory-size.sh at master · prometheus-community/node-exporter-textfile-collector-scripts (github.com)