[6.1810][case-study] 如何在撰寫一個執行檔，並在 xv6-riscv 裡面運行 ?

xv6-riscv

wtommy_fdgkhdkgh 2026-06-08 22:28:44 ‧ 568 瀏覽

分享至

系列文章 : [6.1810] 跟著 MIT 6.1810 學習基礎作業系統觀念

大綱

如何撰寫一個執行檔並執行
編譯後產生出的檔案
Makefile
user.ld

如何撰寫一個執行檔並執行

寫一個簡單的 hello world source code : hello.c !

https://github.com/TommyWu-fdgkhdkgh/xv6-riscv/commit/41f56524a771440553471ea88035624048969580#diff-21b94feaba45d95151070193e13483e1dd49119cd62de960cfb52fa5eabcac7b

很簡單的用 printf 來列印出 hello !
最後記得要用 exit(0) 結束程式。

在 Makefile 裡面新增 _hello

https://github.com/TommyWu-fdgkhdkgh/xv6-riscv/commit/41f56524a771440553471ea88035624048969580#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52

需要在 Makefile 裡面表明說，現在多一個檔案需要編譯了 !

執行 xv6-riscv 作業系統，可以在 file system 裡面看到 hello 這個程式。我們執行 ls 之後，可以看到 hello 這個 ELF 已經在裡面了!

$ ls
.              1 1 1024
..             1 1 1024
README         2 2 2425
cat            2 3 36784
echo           2 4 35600
forktest       2 5 17488
grep           2 6 40224
init           2 7 36072
kill           2 8 35536
ln             2 9 35344
ls             2 10 38896
mkdir          2 11 35600
rm             2 12 35576
sh             2 13 58632
stressfs       2 14 36464
usertests      2 15 188592
grind          2 16 51816
wc             2 17 37680
zombie         2 18 34952
logstress      2 19 37520
forphan        2 20 36408
dorphan        2 21 35872
hello          2 22 35064
console        3 23 0

取得 hello world !!

$ ./hello
hello xv6-riscv!!

編譯後產生出的檔案

編譯 hello.c 之後，toolchain 會產生一些檔案。

_hello
- 編譯出來的 ELF 執行檔，會被放進 xv6-riscv 的 filesystem 裡面。
hello.asm
- hello 這個程式的組合語言。
hello.d
- 可以看到 source file : hello.c 相依於哪些檔案，output 的形式剛好相容於 Makefile，於是在寫 Makefile script 的時候，有時候也可以利用這個檔案。這樣子當任何 header file 有改動的時候，Makefile 也能偵測到，並重新編譯這個 source file。
- e.g. user/hello.o: user/hello.c kernel/types.h user/user.h
hello.o
- source file 編譯出來的 object file，經過 linker 之後才會變成 ELF 執行檔
hello.sym
- 這個 ELF 所包含的所有 symbol，以及其對應的 virtual address，方便我們 debug。

Makefile

_%: %.o $(ULIB) $U/user.ld
	$(LD) $(LDFLAGS) -T $U/user.ld -o $@ $< $(ULIB)
	$(OBJDUMP) -S $@ > $*.asm
	$(OBJDUMP) -t $@ | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d' > $*.sym

這一段是用來 link user program 的，這邊來閱讀程式碼。

_%: %.o $(ULIB) $U/user.ld

這個 underscore ( _ ) 是 xv6-riscv 為了辨識 user program 所設定的前綴。例如 _ls, _cat … 但要注意的是，我們的 source file 不需要這個前綴 ( e.g. ls.c, cat.c )
在 Makefile 的規則中，colon : 左邊的是 target ，右邊的是 prerequisites ( dependencies )。在產生 target 前，需要先產生所有的 prerequisites。假如 prerequisites 終於集齊了，才會執行下面的程式碼，以便產生 target。
target
- _%
  - e.g. user/_ls, user/_cat, user/_hello …
prerequisites
- %.o
  - e.g. user/ls.o, user/cat.o, user/hello.o …
- $(ULIB)
  - user library, e.g. user/ulib.o, user/usys.o …
- $U/user.ld
  - 給 user program 用的 linker script
  - 編譯出一堆 object file 之後，要怎麼 link 在一起呢 ? linker script 裡面制定了 linker 的規則，以及 memory layout。

這邊介紹一些 Makefile 裡面會使用到的符號

假如目前匹配到的是 user/_ls: user/ls.o $(ULIB) $U/user.ld
_%
- 會自動配對有 _ 的字串
- e.g. user/_ls，此時 % 會代表 ls，於是 $* 會成為 user/ls
$@
- the name of the target
- user/_ls
$<
- 第一個 prerequisites
- user/ls.o
$^
- 全部的 prerequisites
- user/ls.o $(ULIB) $U/user.ld
$*
- the stem of the target (the text matching the % in the pattern).
- 因為 % 這邊對應的是 ls，所以 $* 會是 user/ls

	$(LD) $(LDFLAGS) -T $U/user.ld -o $@ $< $(ULIB)

這一步會把多個 object files 連結 ( link ) 成最後的執行檔。
$(LD)
- linker 這個程式本身，大概會是 riscv64-xxx-ld
$(LDFLAGS)
- 要給 linker 的 flags
-T $U/user.ld
- 當我們想要控制 memory layout，使用特定的 linker script 的時候，需要使用 -T 並指定 linker script。
-o $@
- 輸出的 ELF 的檔案名稱
$<
- 上面有提到過，這會是第一個 prerequisite ( e.g. user/ls.o )
$(ULIB)
- user library, e.g. user/ulib.o, user/usys.o …
Summary
- 想要把 user/ls.o, 以及 user library 連結成單一一個 ELF ( e.g. user/_ls )，並且 memory layout 要遵循 user/user.ld。

	$(OBJDUMP) -S $@ > $*.asm

輸出這個 ELF 的組合語言，方便我們 debug。
-S
- Disassemble the executable, mixing source code with assembly instructions (requires debugging symbols in the object files).
>
- redirects the output
$*.asm
- 因為之前 % 會匹配到 ls，於是 $* 會是 user/ls。
- 這邊加上後綴，於是會變成 user/ls.asm

	$(OBJDUMP) -t $@ | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d' > $*.sym

$(OBJDUMP) -t $@
- 會印出目標執行檔的 symbol table
sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d'
- 進行字串處理，更改輸出的格式。
$*.sym
- 把字串處理後的資料，放到 user/ls.sym

這邊的 rule 是從 object file ( e.g. ls.o ) 產生 ELF ( e.g. _ls )。
那 object file ( e.g. ls.o ) 又是從哪個 rule 產生的呢 ???

我們可以用 command make -W user/ls.c -n user/_ls 去觀察 user/_ls 從頭到尾是怎麼產生的。

可以看到 %.o : %.c 這個 rule 並不在 Makefile 裡面!

其實我們在 Makefile 裡面明確設定的 rule，被稱為 explicit rule，而沒有被我們明確設定的 rule，且有在 GNU Make 裡面 built-in 的 rule 被稱為 implicit rule。

當 implicit rule 存在，且不存在相對應的 explicit rule 的時候，就會去使用 implicit rule

想知道當前的 Makefile 有哪些 implicit rule，可以使用 command make -p -f /dev/null > default_rules.txt

這邊使用的 implicit rule 會是

    %.o: %.c
            $(CC) $(CPPFLAGS) $(CFLAGS) -c -o $@ $<

$@
- the name of the target
- 假如目前是 ls.o: ls.c，則 $@ 的值會是 ls.o
$<
- 第一個 prerequisites
- 假如目前是 ls.o: ls.c aaa bbb，則 $< 的值會是 ls.c
$^
- 全部的 prerequisites
- 假如目前是 ls.o: ls.c aaa bbb，則 $^ 的值會是 ls.c aaa bbb

user.ld

OUTPUT_ARCH( "riscv" )

表示 output binary 的 target architecture 是 RISC-V

SECTIONS

標示出一個 memory layout section

{

 . = 0x0;

the dot . : location counter
location counter 會跟蹤當前的 virtual address。
xv6-riscv 的設計會讓 user programe 的開頭在 virtual address == 0x0 位置。

  .text : {
    *(.text .text.*)
  }

放置包含了來自所有 object files 的 .text section
.text 裡面包含可執行的 instruction

  .rodata : {
    . = ALIGN(16);
    *(.srodata .srodata.*) /* do not need to distinguish this from .rodata */
    . = ALIGN(16);
    *(.rodata .rodata.*)
  }

. = ALIGN(16);
- 表示這個 section 需要對齊 16 bytes
srodata : small read-only data
rodata : standard read-only data

  .eh_frame : {
       *(.eh_frame)
       *(.eh_frame.*)
   }

eh_frame : exception handling frames … 抱歉實在不知道這個 section 的用處
TODO

  . = ALIGN(0x1000);

這邊突然需要對齊 0x1000 ( 4KB，剛好是一個 page 的 boundary )
這是因為這裡是 read-only / writeable 資料的交界，這邊對齊一個 page 的大小，才能用 page table 去適當的管理 read-only / writeable 的權限。

  .data : {
    . = ALIGN(16);
    *(.sdata .sdata.*) /* do not need to distinguish this from .data */
    . = ALIGN(16);
    *(.data .data.*)
  }

. = ALIGN(16); : 16-byte alignment.
sdata : small initialized data
data : standard initialized data

  .bss : {
    . = ALIGN(16);
    *(.sbss .sbss.*) /* do not need to distinguish this from .bss */
    . = ALIGN(16);
    *(.bss .bss.*)
  }

sbss : small uninitialized data
bss : standard uninitialized data

  PROVIDE(end = .);
}

假如 object files 裡面沒有去定義 ( define )，但是卻有去 reference end 這個 symbol 的話，linker 就會提供 end 這個 global symbol。
end 的值會是當前的 virtual address ( current location counter )，在此會放在 bss section 之後。
作業系統可以利用這個資訊，來看看 heap ( dynamic memory ) 可以從哪個地方開始，雖然就我所知，xv6-riscv kernel 並不會使用這個東西 ( 詳情可以看 kernel/exec.c/kexec )