dmesg -T中可以看到oom日志,这里会打印进程名,但是往往无法通过名称马上判断是哪个pod或者docker,比较精确的信息是pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94,这是pod的cgroup名称。
1 2
| [二 7月 1 23:14:57 2025] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=be6bd05b65e1cc5ceddfddbe2e28f64f815c9ac3c2322258f8cbe7f8cb1e031b,mems_allowed=0-1,oom_memcg=/kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94,task_memcg=/kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/be6bd05b65e1cc5ceddfddbe2e28f64f815c9ac3c2322258f8cbe7f8cb1e031b,task=filebeat,pid=22940,uid=0 [二 7月 1 23:14:57 2025] Memory cgroup out of memory: Killed process 22940 (filebeat) total-vm:2837488kB, anon-rss:179276kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:984kB oom_score_adj:999
|
一般oom不会删除pod,只会触发docker容器的重启,所以查找新的进程信息来定位
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| cd /sys/fs/cgroupfs/systemd 或者 cd /sys/fs/cgroup/systemd find . |grep pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94 ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94 ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/cgroup.procs ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/14cb70b4021e61562d2860b7e1f93af78c062cf319859899d74e5c794a4d109d ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/14cb70b4021e61562d2860b7e1f93af78c062cf319859899d74e5c794a4d109d/cgroup.procs ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/14cb70b4021e61562d2860b7e1f93af78c062cf319859899d74e5c794a4d109d/tasks ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/14cb70b4021e61562d2860b7e1f93af78c062cf319859899d74e5c794a4d109d/notify_on_release ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/14cb70b4021e61562d2860b7e1f93af78c062cf319859899d74e5c794a4d109d/cgroup.clone_children ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/tasks ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/notify_on_release ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c/cgroup.procs ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c/tasks ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c/notify_on_release ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c/cgroup.clone_children ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/cgroup.clone_children
|
会有2个容器,一个是pause容器,另一个是工作容器,随便找一个根据task中最小的进程号往往也是第一个,进行查找。
1 2 3 4 5 6 7 8 9 10 11 12 13
| [root@master10 systemd]# cat ./kubepods/burstable/pod45f5facf-212a-42be-9ed1-2c3b4f4f0f94/9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c/tasks 8308 [root@master10 systemd]# ps -ef|grep 8308 root 8308 8281 0 2024 ? 00:00:00 /pause root 12471 14106 0 13:39 pts/0 00:00:00 grep --color=auto 8308 [root@master10 systemd]# ps -ef|grep 8281 root 8281 1 0 2024 ? 00:36:37 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 9fb5500296d25956645a7d96b72a1adb6b5574dc738424fba0632c947f17525c -address /run/containerd/containerd.sock root 8308 8281 0 2024 ? 00:00:00 /pause root 12992 14106 0 13:39 pts/0 00:00:00 grep --color=auto 8281 root 18281 13229 0 5月14 ? 00:00:00 [timeout] <defunct> root 28281 13229 0 3月14 ? 00:00:00 [timeout] <defunct> [root@master10 systemd]# docker ps|grep 9fb550 9fb5500296d2 registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.2 "/pause" 6 months ago Up 6 months k8s_POD_filebeat-s7dnb_elk-log_45f5facf-212a-42be-9ed1-2c3b4f4f0f94_1
|
当然,知道了进程号之后,也可以通过直接查看进程信息来判断
1 2
| [root@master10 systemd]# strings /proc/8308/environ |grep HOSTNAME HOSTNAME=master10.1.20.201
|
这里由于pod使用了hostnetwork,因此无法判断,只能用上面的方法进行查找,否则是可以直接看到pod名称的。