昨天討論到了 Docker 的流程如下:
Docker CLI ==> dockerd ==> conatinerd ==> containerd-shim ==> runc ==> container
但留下了一個問題: runc
在啟動 container 後就會離開,但從 host 中的 ps
觀察起來,containerd-shim
會變成 container
的 parent process,這跟我們理解的、orphan
process 會由 PID 1 的 init process 接管有所不同(見 Day 17 的實驗),這是怎麼辦到的呢?
在 linux 中,有一個叫做 prctl 的 API,他可以用來調整 process 或是 thread 的一些行為,他有一個選項是 PR_SET_CHILD_SUBREAPER
,文件中是這樣寫的:
If arg2 is nonzero, set the "child subreaper" attribute of the calling process; if arg2 is zero, unset the attribute.
A subreaper fulfills the role of init(1) for its descendant processes. When a process becomes orphaned (i.e., its immediate parent terminates), then that process will be reparented to the nearest still living ancestor subreaper.
看起來這個選項的設定,可以讓這個 process 扮演他後代子孫的 init process,也就是當他的子孫 process 成為 orphan
procss 時,這個 child subreaper
屬性被設定的 process 可以接收他們。
讓我們來實驗看看,修改一下 Day 16 的程式:
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/wait.h>
#include <sys/prctl.h>
char* const argv_list[] = {
"/bin/bash",
NULL
};
int main(void)
{
printf("current PID: %d\n", getpid());
// 設定目前 process 的 child subreaper 屬性
int ret = prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0);
if(ret < 0) {
printf("can't set chlid subreaper\n");
exit(EXIT_FAILURE);
}
pid_t pid = fork();
if (pid == -1) {
// pid == -1: error occurred
printf("can't fork, error occurred\n");
exit(EXIT_FAILURE);
} else if (pid == 0) {
// pid == 0: child process created
printf("I am child process with PID: %d\n", getpid());
execv(argv_list[0], argv_list);
exit(0);
} else {
// pid > 0: a positive number is returned for the pid of parent process
printf("I am parent process with PID %d and my child is %d\n", getpid(), pid);
waitpid(pid, NULL, 0);
// 不要讓 parent process 停下來
while(1) {
sleep(1);
}
printf("Parent - container stopped!\n");
}
return 0;
}
編譯且執行:
ubuntu@ip-sss:~/fork$ gcc -o subreaper subreaper.c
current PID: 82506
I am parent process with PID 82506 and my child is 82507
I am child process with PID: 82507
root@ip-xxx:/home/ubuntu/fork# sleep 3000 &
[1] 82515
root@ip-xxx:/home/ubuntu/fork#
用另外一個 terminal 觀察一下:
ubuntu@ip-xxx:~$ ps -eaf -o pid,ppid,comm
PID PPID COMMAND
82185 82184 bash
82505 82185 \_ sudo
82506 82505 \_ subreaper
82507 82506 \_ bash
82515 82507 \_ sleep
...略
目前情況是這樣的:
subreaper (82506) ==> bash (82507) ==> sleep(82515)
祖父 兒子 孫子
現在讓我們來殺掉兒子,並且觀察看看孫子會被誰接管:
ubuntu@ip-xxx:~$ sudo kill -9 82507
ubuntu@ip-xxx:~$ ps -eaf -o pid,ppid,comm
PID PPID COMMAND
82185 82184 bash
82505 82185 \_ sudo
82506 82505 \_ subreaper
82515 82506 \_ sleep
...略
孫子 sleep process (82515) 的 PPID 不是 1,而是原本的祖父 subreaper (82506),是不是很神奇呢?透過 prctl
API 去設定 process PR_SET_CHILD_SUBREAPER
屬性,就可以讓他接收子孫輩的孤兒了!
現在讓我們回到 docker container 身上:
$ docker run -it --rm alpine ash
/ # ash
/ # sleep 3000 &
/ # ps -eaf -o pid,ppid,comm,args
PID PPID COMMAND COMMAND
1 0 ash ash
6 1 ash ash
7 6 sleep sleep 3000
8 6 ps ps -eaf -o pid,ppid,comm,args
回到 host 看一下:
ubuntu@ip-xxx:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0a7f306f1dda alpine "ash" 30 seconds ago Up 29 seconds ...略
ubuntu@ip-xxx:~$ docker top 0a7f306f1dda -eaf -o pid,ppid,comm
PID PPID COMMAND
83073 83052 ash
83099 83073 \_ ash
83100 83099 \_ sleep
ubuntu@ip-xxx:~$ ps -eaf
UID PID PPID C STIME TTY TIME CMD
root 83052 1 0 14:47 ? 00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 0a7f306f1ddadd
root 83073 83052 0 14:47 pts/0 00:00:00 ash
root 83099 83073 0 14:47 pts/0 00:00:00 ash
root 83100 83099 0 14:48 pts/0 00:00:00 sleep 3000
ubuntu 83126 82095 0 14:49 pts/1 00:00:00 ps -eaf
整理一下:
containerd-shim-runc-v2 (83052)
\_ ash (83073)
\_ ash (83099)
\_ sleep (83100)
這時候讓我們來 kill 掉 83099 (第二個 ash):
ubuntu@ip-xxx:~$ sudo kill -9 83099
ubuntu@ip-xxx:~$ ps -eaf
UID PID PPID C STIME TTY TIME CMD
root 83052 1 0 14:47 ? 00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 0a7f306f1ddadd
root 83073 83052 0 14:47 pts/0 00:00:00 ash
root 83100 83073 0 14:48 pts/0 00:00:00 sleep 3000
ubuntu 83126 82095 0 14:49 pts/1 00:00:00 ps -eaf
看來 83100 被 83073 給接收了!看來 container 中的 PID 1 是有好好擔任起 init process 的職責,會負責接收 orphan
process,但事情真的有這麼單純嗎?
containerd-shim-runc-v2 (83052)
\_ ash (83073)
\_ sleep (83100)