2024 iThome 鐵人賽

DAY 12

Python

為你自己讀 CPython 原始碼系列第 12 篇

Day 12 - 從準備到起飛！

16th鐵人賽 python python3 原始碼為你自己學

高見龍

2024-09-26 22:24:59

1157 瀏覽

分享至

本文同步刊載於「為你自己學 Python - 從準備到起飛！」

從準備到起飛！

為你自己學 Python

假設你寫了一個 Python 程式像這樣：

# 檔案：hello.py

def greeting(name):
    return f"Hello, {name}!"

print(greeting("Kitty"))

當你在終端機中輸入 python hello.py 指令的時候，的確開開心心的印出了 Hello Kitty 字樣，但你知道 Python 直譯器做了什麼事嗎？或是，如果你也想試著追 CPython 的原始碼，該從哪裡開始呢？我在「為你自己學 Python」書中有一個章節介紹了 Python 內建的偵錯器 pdb，可以設定中斷點或是逐行觀看 Python 程式運行的過程，不過如果是要追 Python 直譯器本身的話，pdb 就幫不上忙了，得再拿另一款專門用來處理 C 程式的偵錯器。

使用 Debugger

業界比較常見的 C 程式偵錯器有 GDB 和 LLDB，兩者的功能跟指令差不多，不過因為我的電腦環境是 macOS，所以 LLDB 對我來說比較簡單一點，這裡我就用它當做範例。原本我要執行的指令是：

$ ./python.exe hello.py

這裡的 python.exe 是我自己編譯的 CPython 直譯器，現在我要請 LLDB 幫我執行這個指令，所以我在前面加上 lldb：

$ lldb ./python.exe hello.py

(lldb) target create "./python.exe"
Current executable set to '/Users/kaochenlong/sources/python/cpython/python.exe' (arm64).
(lldb) settings set -- target.run-args  "hello.py"
(lldb) breakpoint set --name main
Breakpoint 1: 13 locations.
(lldb)

這裡的 breakpoint set --name main 是指要在 main 函數，也就是整個程式的入口點打上一個中斷點，這樣我們就可以在程式開始執行的時候停下來，然後一步一步的觀看程式的運行過程。如果覺得這個指令太囉嗦，也可以直接寫成 b main，效果是一樣的。

設定好中斷點，就可以開始執行程式了：

程式進入點

(lldb) run
Process 77203 launched: '/Users/kaochenlong/sources/python/cpython/python.exe' (arm64)
Process 77203 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003d1c python.exe`main(argc=2, argv=0x000000016fdfe128) at python.c:15:12 [opt]
   12   int
   13   main(int argc, char **argv)
   14   {
-> 15       return Py_BytesMain(argc, argv);
   16   }
   17   #endif
Target 0: (python.exe) stopped.
warning: python.exe was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb)

在 LLDB 裡執行 run 指令開始執行程式，程式執行到 main 函數的時候就會停下來。從訊息可以看的出來目前是在 python.c 這個程式的第 15 行，然後準備執行 Py_BytesMain() 函數。

順著 Py_BytesMain()：

// 檔案：Modules/main.c

int
Py_BytesMain(int argc, char **argv)
{
    _PyArgv args = {
        .argc = argc,
        .use_bytes_argv = 1,
        .bytes_argv = argv,
        .wchar_argv = NULL};
    return pymain_main(&args);
}

看起來只做了一些參數的轉換，接著呼叫 pymain_main() 函數：

// 檔案：Modules/main.c

static int
pymain_main(_PyArgv *args)
{
    // ... 略 ...
    return Py_RunMain();
}

再追 Py_RunMain() 函數：

// 檔案：Modules/main.c

int
Py_RunMain(void)
{
    int exitcode = 0;

    pymain_run_python(&exitcode);

    // ... 略 ...
}

這個 pymain_run_python() 就差不多是重點了：

// 檔案：Modules/main.c

static void
pymain_run_python(int *exitcode)
{
    // ... 略 ...

    if (config->run_command) {
        *exitcode = pymain_run_command(config->run_command);
    }
    else if (config->run_module) {
        *exitcode = pymain_run_module(config->run_module, 1);
    }
    else if (main_importer_path != NULL) {
        *exitcode = pymain_run_module(L"__main__", 0);
    }
    else if (config->run_filename != NULL) {
        *exitcode = pymain_run_file(config);
    }
    else {
        *exitcode = pymain_run_stdin(config);
    }

    // ... 略 ...
}

因為我們是執行 python hello.py，所以這裡會進到 config->run_filename 這條分支，執行 pymain_run_file() 函數：

// 檔案：Modules/main.c

static int
pymain_run_file(const PyConfig *config)
{
    // ... 略 ...

    int res = pymain_run_file_obj(program_name, filename,
                                  config->skip_source_first_line);

    // ... 略 ...
}

把程式檔案讀進來

其實追到這裡都還沒有真正執行我們寫的程式，現在只是剛把 hello.py 這個檔案讀進來而已，真正準備要做事的，是接下來的 pymain_run_file_obj() 函數。

// 檔案：Modules/main.c

static int
pymain_run_file_obj(PyObject *program_name, PyObject *filename,
                    int skip_source_first_line)
{
    // ... 略 ...
    FILE *fp = _Py_fopen_obj(filename, "rb");

    // ... 略 ...

    PyCompilerFlags cf = _PyCompilerFlags_INIT;
    int run = _PyRun_AnyFileObject(fp, filename, 1, &cf);
    return (run != 0);
}

這個函數的重點在最後面的 _PyRun_AnyFileObject() 函數：

// 檔案：Python/pythonrun.c

int
_PyRun_AnyFileObject(FILE *fp, PyObject *filename, int closeit,
                     PyCompilerFlags *flags)
{
    // ... 略 ...
    int res;
    if (_Py_FdIsInteractive(fp, filename)) {
        res = _PyRun_InteractiveLoopObject(fp, filename, flags);
        if (closeit) {
            fclose(fp);
        }
    }
    else {
        res = _PyRun_SimpleFileObject(fp, filename, closeit, flags);
    }

    // ... 略 ...
}

在這個函數裡 _Py_FdIsInteractive() 函數會判斷是否是在互動模式，但為什麼都已經寫成 .py 檔了還會進到互動模式？有啊，例如你在 Python 裡寫 input() 函數想要取得使用者輸入的時候，那個狀態就是在互動模式。不過目前我們的 hello.py 程式並沒有互動模式的需求，所以這裡會進到 _PyRun_SimpleFileObject() 函數：

// 檔案：Python/pythonrun.c

int
_PyRun_SimpleFileObject(FILE *fp, PyObject *filename, int closeit,
                        PyCompilerFlags *flags)
{
    // ... 略 ...

    m = PyImport_AddModule("__main__");
    // ... 略 ...

    int pyc = maybe_pyc_file(fp, filename, closeit);
    // ... 略 ...

    if (pyc) {
        FILE *pyc_fp;
        // ... 略 ...

        pyc_fp = _Py_fopen_obj(filename, "rb");
        // ... 略 ...

        v = run_pyc_file(pyc_fp, d, d, flags);
    } else {
        // ... 略 ...

        v = pyrun_file(fp, filename, Py_file_input, d, d,
                       closeit, flags);
    }
    // ... 略 ...
}

在這個函數會先建立一個名為 __main__ 的模組，待會我們的程式就會在這個模組裡運行。接著差不多開始要把我們寫的 hello.py 讀進來了，不過這時會先檢查看看有沒有對應的 .pyc 檔，如果有就會把 .pyc 以二進位的方式把檔案讀進來，並且執行 run_pyc_file() 函數，不然就是 pyrun_file() 函數。目前我們寫的 hello.py 還沒有對應的 .pyc 檔，所以會進到 pyrun_file() 函數。

通常專案執行的過程都會產生 .pyc 檔，下次再執行的時候就不用再重新編譯一次，直接執行 .pyc 檔就好。如果你想要手動產生 .pyc 檔，可以使用 py_compile 模組：

$ python -m py_compile hello.py

建立抽象語法樹

再回來看 pyrun_file() 函數：

// 檔案：Python/pythonrun.c

static PyObject *
pyrun_file(FILE *fp, PyObject *filename, int start, PyObject *globals,
           PyObject *locals, int closeit, PyCompilerFlags *flags)
{
    PyArena *arena = _PyArena_New();
    // ... 略 ...

    mod_ty mod;
    mod = _PyParser_ASTFromFile(fp, filename, NULL, start, NULL, NULL,
                                flags, NULL, arena);
    // ... 略 ...

    PyObject *ret;
    if (mod != NULL) {
        ret = run_mod(mod, filename, globals, locals, flags, arena);
    }
    else {
        ret = NULL;
    }
    _PyArena_Free(arena);
    return ret;
}

這裡建立了一個「競技場」（Arena）物件，這個物件是用來管理記憶體的，當程式執行完成後，可以簡單地釋放整個 Arena，而不需要太複雜的清理邏輯（其實也就是跑個 while 迴圈對裡面每個物件執行 PyMem_Free() 函數而已）。弄一個記憶體箱子來放物件我可以理解，但我就不知道為什麼這個要叫做 Arena 呢？

接下來的 _PyParser_ASTFromFile() 光從名字就猜的出來要做什麼了，這個函數是用來把 Python 程式碼讀進來轉換成抽象語法樹（AST）。AST 的轉換細節後續章節再來詳細介紹，目前只要先知道在這個階段才剛把檔案讀進來並轉換成 AST 就行了。

建立 Code Object

轉換完成後就會進到 run_mod() 函數：

// 檔案：Python/pythonrun.c

static PyObject *
run_mod(mod_ty mod, PyObject *filename, PyObject *globals, PyObject *locals,
            PyCompilerFlags *flags, PyArena *arena)
{
    PyThreadState *tstate = _PyThreadState_GET();
    PyCodeObject *co = _PyAST_Compile(mod, filename, flags, -1, arena);

    // ... 略 ...

    PyObject *v = run_eval_code_obj(tstate, co, globals, locals);
    Py_DECREF(co);
    return v;
}

這個函數的重點在 _PyAST_Compile() 函數會把傳進來的 AST 轉換成 Code Object。Code Object 的細節我們在會在後面章節跟 AST 一起介紹。

正式啟動！

接下來，就是執行編譯好的 Code Object，這個過程會進到 run_eval_code_obj() 函數：

// 檔案：Python/pythonrun.c

static PyObject *
run_eval_code_obj(PyThreadState *tstate, PyCodeObject *co, PyObject *globals, PyObject *locals)
{
    PyObject *v;
    _PyRuntime.signals.unhandled_keyboard_interrupt = 0;

    // ... 略 ...

    v = PyEval_EvalCode((PyObject*)co, globals, locals);
    if (!v && _PyErr_Occurred(tstate) == PyExc_KeyboardInterrupt) {
        _PyRuntime.signals.unhandled_keyboard_interrupt = 1;
    }
    return v;
}

終於，這個 PyEval_EvalCode() 函數就是執行我們寫的 hello.py 程式碼的地方了，繞了一大圈，終於到了這個程式的最後一步。這個函數會把我們的 Code Object 丟進去執行，並且把執行結果回傳，沒出錯的話，這時應該就會在畫面上印出 Hello Kitty 字樣。

打完收工！這就是 Python 直譯器在執行我們寫的 hello.py 程式碼的過程。這個過程中有很多細節，例如 AST、Code Object、執行環境等等，這些細節我們會在後面章節一一介紹。