dmesg -T中可以看到oom日志,这里会打印进程名,但是往往无法通过名称马上判断是哪个pod或者docker,比较精确的信息是pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94,这是pod的cgroup名称。

1
2
[二 7月  1 23:14:57 2025] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=be6bd05b65e1cc5ceddfddbe2e28f64f815c9ac3c2322258f8cbe7f8cb1e031b,mems_allowed=0-1,oom_memcg=/kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94,task_memcg=/kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/be6bd05b65e1cc5ceddfddbe2e28f64f815c9ac3c2322258f8cbe7f8cb1e031b,task=filebeat,pid=22940,uid=0
[二 7月 1 23:14:57 2025] Memory cgroup out of memory: Killed process 22940 (filebeat) total-vm:2837488kB, anon-rss:179276kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:984kB oom_score_adj:999

一般oom不会删除pod,只会触发docker容器的重启,所以查找新的进程信息来定位

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cd /sys/fs/cgroupfs/systemd 或者 cd /sys/fs/cgroup/systemd
find . |grep pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/cgroup.procs
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/14cb70b4021e61562d2860b7e1f93af78c062cf319859899d74e5c794a4d109d
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/14cb70b4021e61562d2860b7e1f93af78c062cf319859899d74e5c794a4d109d/cgroup.procs
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/14cb70b4021e61562d2860b7e1f93af78c062cf319859899d74e5c794a4d109d/tasks
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/14cb70b4021e61562d2860b7e1f93af78c062cf319859899d74e5c794a4d109d/notify_on_release
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/14cb70b4021e61562d2860b7e1f93af78c062cf319859899d74e5c794a4d109d/cgroup.clone_children
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/tasks
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/notify_on_release
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c/cgroup.procs
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c/tasks
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c/notify_on_release
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c/cgroup.clone_children
./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/cgroup.clone_children

会有2个容器,一个是pause容器,另一个是工作容器,随便找一个根据task中最小的进程号往往也是第一个,进行查找。

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@master10 systemd]# cat ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c/tasks
8308
[root@master10 systemd]# ps -ef|grep 8308
root 8308 8281 0 2024 ? 00:00:00 /pause
root 12471 14106 0 13:39 pts/0 00:00:00 grep --color=auto 8308
[root@master10 systemd]# ps -ef|grep 8281
root 8281 1 0 2024 ? 00:36:37 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c -address /run/containerd/containerd.sock
root 8308 8281 0 2024 ? 00:00:00 /pause
root 12992 14106 0 13:39 pts/0 00:00:00 grep --color=auto 8281
root 18281 13229 0 5月14 ? 00:00:00 [timeout] <defunct>
root 28281 13229 0 3月14 ? 00:00:00 [timeout] <defunct>
[root@master10 systemd]# docker ps|grep 9fb550
9fb5500296d2 registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.2 "/pause" 6 months ago Up 6 months k8s_POD_filebeat-s7dnb_elk-log_45f5facf-212a-42be-9ed1-2c3b4f4f0f94_1

当然,知道了进程号之后,也可以通过直接查看进程信息来判断

1
2
[root@master10 systemd]# strings /proc/8308/environ |grep HOSTNAME
HOSTNAME=master10.1.20.201

这里由于pod使用了hostnetwork,因此无法判断,只能用上面的方法进行查找,否则是可以直接看到pod名称的。