小弟最近在寫一個timeout功能,main process假設有5個job要運行,每個job要做計時,時間到要回覆錯誤訊息。
分為Linux與Windows兩個版本,這兩個版本共通的邏輯是:
Main Process :
- for loop job
- fork timeout counter process
- do job
- kill sub process
- next job
Sub Process:
- sleep seconds
- kill main process
- exit()
簡而言之,我設計了一個main與sub互相傷害的流程,看是job先完成,main殺掉sub;還是timeout counter先醒來,sub殺掉main。
以前我學這塊的時候,是學linux的那一套,我參照網路上的教學和自己以前學過的基本概念,寫出來一個在Ubuntu上面可以運行的版本。然而卻在windows上面實作時遇到一些問題,詳情請看該章節。
Linux版本我是在Ubuntu 22.04.2 LTS版本上,使用內建的python 3測試,Python版本為3.10.6
import os
import sys
import time
import signal
def interrupt_handler(signum, frame):
print("Kill Main Process")
raise KeyboardInterrupt
def child_halde_interrupt(signum, frame):
print("Kill Time Out Counter")
sys.exit(0)
if __name__ == "__main__":
time_out_second = 5
signal.signal(signal.SIGINT, interrupt_handler)
for job in range(5):
pid = os.fork()
if(pid > 0):
try:
for i in range((job+1)*2):
print(str(os.getpid()) + " Kero " + str(i))
time.sleep(1)
os.kill(pid, signal.SIGINT)
except KeyboardInterrupt:
print("Job : " + str(job) + " timeout")
signal.signal(signal.SIGINT, interrupt_handler)
continue
else:
signal.signal(signal.SIGINT, child_halde_interrupt)
time.sleep(time_out_second)
os.kill(os.getppid(), signal.SIGINT)
break
1774 Kero 0
1774 Kero 1
1774 Kero 0
Kill Time Out Counter
1774 Kero 1
1774 Kero 2
1774 Kero 3
1774 Kero 0
Kill Time Out Counter
1774 Kero 1
1774 Kero 2
1774 Kero 3
1774 Kero 4
1774 Kero 5
Kill Main Process
Job : 2 timeout
1774 Kero 0
1774 Kero 1
1774 Kero 2
1774 Kero 3
1774 Kero 4
Kill Main Process
Job : 3 timeout
1774 Kero 0
1774 Kero 1
1774 Kero 2
1774 Kero 3
1774 Kero 4
1774 Kero 5
Kill Main Process
Job : 4 timeout
Windwos版本比較尷尬,介於某些特殊原因,我是用windows 10開sandbox,然後在sandbox中安裝python 3.11.2測試的。
Windos的版本遇到了一些問題,我不確定是不是因為我直接照linux的想法下去寫,中間有OS的差異性導致。Main Process和Sub Process的PID和順序對照表如下:
Job | Main Process | Sub Process |
---|---|---|
0 | 6048 | 5476 |
1 | 6048 | 2188 |
2 | 6048 | 5396 |
3 | 6048 | 4592 |
4 | 6048 | 3836 |
根據程式碼的邏輯,假設5次job,本來前2次的timeout counter要被main process主動關閉,而後3次的main process會被timeout counter關閉。
可能中間process互殺時有些延遲,導致這套流程誰殺誰不是太準確,不過這功能需要的精準度要求不會太高(可能會設定個30以上的時間,所以差個1、2秒不會是太嚴重的問題,跑了28秒才結束的job也該被關切了)。
接下來是我觀察輸出結果統整的問題描述:
import os
import signal
from time import sleep
from multiprocessing import Process
def interrupt_handler(signum, frame):
print("Main Process was killed")
raise KeyboardInterrupt
def child_halde_interrupt(signum, frame):
print("Timeout Counter : " + str(os.getpid()) + " was killed")
raise KeyboardInterrupt
def subProcess():
print("I'm " + str(os.getpid()))
signal.signal(signal.SIGBREAK, child_halde_interrupt)
try:
sleep(5)
print(str(os.getpid()) + " kill " + str(os.getppid()))
os.kill(os.getppid(), signal.CTRL_BREAK_EVENT)
except:
print(str(os.getpid()) + " is killed")
if __name__ == "__main__":
time_out_second = 5
signal.signal(signal.SIGBREAK, interrupt_handler)
for job in range(5):
p = Process(target=subProcess)
p.start()
try:
for i in range((job+1)*2):
print(str(os.getpid()) + " Kero " + str(i))
sleep(1)
print(str(os.getpid()) + " kill " + str(p.pid))
os.kill(p.pid, signal.CTRL_BREAK_EVENT)
except KeyboardInterrupt:
print("Job : " + str(job) + " timeout")
signal.signal(signal.SIGBREAK, interrupt_handler)
continue
6048 Kero 0
I'm 5476
6048 Kero 1
6048 kill 5476
6048 Kero 0
I'm 2188
Main Process was killed
Job : 1 timeout
6048 Kero 0
I'm 5396
6048 Kero 1
6048 Kero 2
Timeout Counter : 5476 was killed
5476 is killed
6048 Kero 3
6048 Kero 4
2188 kill 6048
Main Process was killed
Job : 2 timeout
6048 Kero 0
Timeout Counter : 5396 was killed
5396 is killed
I'm 4592
6048 Kero 1
6048 Kero 2
6048 Kero 3
6048 Kero 4
6048 Kero 5
4592 kill 6048
Main Process was killed
Job : 3 timeout
6048 Kero 0
I'm 3836
6048 Kero 1
6048 Kero 2
6048 Kero 3
6048 Kero 4
6048 Kero 5
3836 kill 6048
Main Process was killed
Job : 4 timeout