[6.1810][lab] system calls (一)

xv6-riscv

wtommy_fdgkhdkgh 2026-06-19 00:29:50 ‧ 789 瀏覽

分享至

系列文章 : [6.1810] 跟著 MIT 6.1810 學習基礎作業系統觀念

大綱

Using gdb (easy)

Using gdb (easy)

作業要求

使用 gdb 來對 xv6-riscv 的 kernel 下斷點。這邊的範例是對 syscall 這個 function 下斷點。並且嘗試使用 layout src，以及 backtrace

(gdb) b syscall
Breakpoint 1 at 0x80002142: file kernel/syscall.c, line 243.
(gdb) c
Continuing.
[Switching to Thread 1.2]

Thread 2 hit Breakpoint 1, syscall () at kernel/syscall.c:243
243     {
(gdb) layout src
(gdb) backtrace

Looking at the backtrace output, which function called syscall?
Type n a few times to step past struct proc *p = myproc(); Once past this statement, type p /x *p, which prints the current process's proc struct (see kernel/proc.h>) in hex.
What is the value of p->trapframe->a7 and what does that value represent? (Hint: look at user/init.c, the first user program xv6 starts, and its compiled assembly user/init.asm.)
The processor is running in supervisor mode, and we can print privileged registers such as sstatus.

(gdb) p /x $sstatus

What was the previous mode that the CPU was in?
The xv6 kernel code contains consistency checks whose failure causes the kernel to panic; you may find that your kernel modifications cause panics. For example, replace the statement num = p->trapframe->a7; with num = * (int *) 0; at the beginning of syscall, run make qemu, and you will see something similar to:

xv6 kernel is booting

hart 2 starting
hart 1 starting
scause=0xd sepc=0x80001bfe stval=0x0
panic: kerneltrap

To track down the source of a kernel page-fault panic, search for the sepc value printed for the panic you just saw in the file kernel/kernel.asm, which contains the assembly for the compiled kernel. Write down the assembly instruction the kernel is panicing at. Which register corresponds to the variable num?
To inspect the state of the processor and the kernel at the faulting instruction, fire up gdb, and set a breakpoint at the faulting epc, like this:

(gdb) b *0x80001bfe
Breakpoint 1 at 0x80001bfe: file kernel/syscall.c, line 138.
(gdb) layout asm
(gdb) c
Continuing.
[Switching to Thread 1.3]

Thread 3 hit Breakpoint 1, syscall () at kernel/syscall.c:138

Confirm that the faulting assembly instruction is the same as the one you found above.
Why does the kernel crash? Hint: look at figure 3-3 in the text; is address 0 mapped in the kernel address space? Is that confirmed by the value in scause above? (See description of scause in RISC-V privileged instructions)
Note that scause was printed by the kernel panic above, but often you need to look at additional info to track down the problem that caused the panic. For example, to find out which user process was running when the kernel paniced, you can print the process's name:

   (gdb) p p->name

What is the name of the process that was running when the kernel paniced? What is its process id (pid)?

作業提示

Using the GNU Debugger : https://pdos.csail.mit.edu/6.828/2019/lec/gdb_slides.pdf
guidance page : https://pdos.csail.mit.edu/6.1810/2025/labs/guidance.html

作業內容

作業 1

作業目標 :
使用 gdb 來對 xv6-riscv 的 kernel 下斷點。這邊的範例是對 syscall 這個 function 下斷點。並且嘗試使用 layout src，以及 backtrace

可以先安裝 gdb

sudo apt-get install gdb-multiarch

然後就可以在 xv6-riscv 的資料夾底下，用 make qemu-gdb

*** Now run 'gdb' in another window.
qemu-system-riscv64 -machine virt -bios none -kernel kernel/kernel -m 128M -smp 3 -nographic -global virtio-mmio.force-legacy=false -drive file=fs.img,if=none,format=raw,id=x0 -device virtio-blk-device,drive=x0,bus=virtio-mmio-bus.0 -S -gdb tcp::26544

-S : QEMU 會停在第一個 instruction，直到 GDB client 送出 continue 指令。
-gdb tcp::26544 : It opens a TCP server on port 26544 and waits for an incoming remote GDB client connection.

之後我們就可以使用 gdb 連進 QEMU

gdb-multiarch kernel/kernel
(gdb) target remote localhost:26544
Remote debugging using localhost:26544
0x0000000000001000 in ?? ()

這時候可以看到，我們的 PC 值停在 0x1000 的地方

這時候我們可以完成作業的要求。

(gdb) b syscall
Breakpoint 1 at 0x8000282a: file kernel/syscall.c, line 133.
(gdb) c
Continuing.

Thread 1 hit Breakpoint 1, syscall () at kernel/syscall.c:133
133     {
(gdb) layout src
(gdb) backtrace
#0  syscall () at kernel/syscall.c:133
#1  0x00000000800025f0 in usertrap () at kernel/trap.c:68
#2  0x0000003ffffff09c in ?? ()
(gdb) bt
#0  syscall () at kernel/syscall.c:133
#1  0x00000000800025f0 in usertrap () at kernel/trap.c:68
#2  0x0000003ffffff09c in ?? ()

作業 2

目標 : Looking at the backtrace output, which function called syscall?

從上面的 output 可以看到，是 usertrap 這個 function 去呼叫了 syscall。
而從 source code 可以看到，確實 usertrap ( kernel/trap.c ) 會去呼叫這個 function。

作業 3

目標 : Type n a few times to step past struct proc *p = myproc(); Once past this statement, type p /x *p, which prints the current process's proc struct (see kernel/proc.h) in hex.

用 gdb 指令 n，讓程式碼走到 struct proc *p = myproc() 以下的時候，再用 p /x *p

(gdb) p /x *p
$1 = {lock = {locked = 0x0, name = 0x80007178, cpu = 0x0}, state = 0x4, chan = 0x0, killed = 0x0, xstate = 0x0, pid = 0x1, parent = 0x0,
  kstack = 0x3fffffd000, sz = 0x4000, pagetable = 0x87f52000, trapframe = 0x87f56000, context = {ra = 0x80001e52, sp = 0x3fffffdc50,
    s0 = 0x3fffffdc80, s1 = 0x8000fd88, s2 = 0x8000f958, s3 = 0x0, s4 = 0x3, s5 = 0x80020a28, s6 = 0x0, s7 = 0x180, s8 = 0xffffffffffffffff,
    s9 = 0x400, s10 = 0x2, s11 = 0x38}, ofile = {0x0 <repeats 16 times>}, cwd = 0x8001de98, name = {0x69, 0x6e, 0x69, 0x74,
    0x0 <repeats 12 times>}}

作業 4

目標 : What is the value of p->trapframe->a7 and what does that value represent? (Hint: look at user/init.c, the first user program xv6 starts, and its compiled assembly user/init.asm.)

What is the value of p->trapframe->a7?

(gdb) p p->trapframe->a7
$3 = 15

What does that value represent?
- 根據 user/usys.S 可以知道， a7 的意思是，現在想要執行哪一個 SYS_call
- 根據 kernel/syscall.h 可以知道 a7 == 15，代表的值是 SYS_open，表示現在我們想要呼叫的 syscall，是 kernel/sysfile.c/sys_open
- 想要驗證這個想法的話，可以直接去看 user/init.asm，這邊在呼叫 open 的時候，會 li a7, 15

00000000000003c4 <open>:
.global open         
open:   
 li a7, SYS_open
 3c4:   48bd                    li      a7,15
 ecall  
 3c6:   00000073                ecall   
 ret
 3ca:   8082                    ret

作業 5

目標 : The processor is running in supervisor mode, and we can print privileged registers such as sstatus.

(gdb) p/x $sstatus
$5 = 0x200000022

作業 6

目標 : What was the previous mode that the CPU was in?

我們可以看 sstatus 的第 8 個 bit ( SPP, Supervisor Previous Privilege )
0 代表是 User mode
1 代表是 Superviosr mode
從上一個作業可以知道， sstatus == 0x200000022，SPP == 0
所以 previous mode 是 User mode。

作業 7

目標 :
The xv6 kernel code contains consistency checks whose failure causes the kernel to panic; you may find that your kernel modifications cause panics. For example, replace the statement num = p->trapframe->a7; with num = * (int *) 0; at the beginning of syscall, run make qemu, and you will see something similar to:

xv6 kernel is booting

hart 2 starting
hart 1 starting
scause=0xd sepc=0x80001bfe stval=0x0
panic: kerneltrap

去 kernel/syscall.c/syscall function 裡面，就更改相對應的程式碼，我這邊的執行效果如下。

xv6 kernel is booting

hart 1 starting
hart 2 starting
scause=0xd sepc=0x8000283a stval=0x0
panic: kerneltrap

scause == 0xd
- Load page fault
sepc == 0x8000283a
- 出錯時的 PC ( program counter ) 值
stval == 0x0
- Load page fault 發生時，所取用的 address
- 在這邊，把 num = * (int *) 0 改成 num = * (int *) 1 的話，stval 會是 1。

作業 8

目標 :
To track down the source of a kernel page-fault panic, search for the sepc value printed for the panic you just saw in the file kernel/kernel.asm, which contains the assembly for the compiled kernel. Write down the assembly instruction the kernel is panicing at. Which register corresponds to the variable num?

從上一個作業的 sepc 值 ( 我的環境是 0x8000283a，每個人的環境可能多多少少有些不同 )，在 kernel/kernel.asm 裡面，找出出問題的 instruction 是 lw a3,0(zero)
variable num 的值會反應在 stval

作業 9

目標 :
To inspect the state of the processor and the kernel at the faulting instruction, fire up gdb, and set a breakpoint at the faulting epc, like this:

(gdb) b *0x80001bfe
Breakpoint 1 at 0x80001bfe: file kernel/syscall.c, line 138.
(gdb) layout asm
(gdb) c
Continuing.
[Switching to Thread 1.3]

Thread 3 hit Breakpoint 1, syscall () at kernel/syscall.c:138

先用 make qemu ，拿到最新的 sepc

xv6 kernel is booting

hart 1 starting
hart 2 starting
scause=0xd sepc=0x8000283c stval=0x1
panic: kerneltrap

在 kernel/kernel.asm 裡面，去尋找 pc == 0x8000283c 會是什麼 instruction

    8000283c:   00104703                lbu     a4,1(zero) # 1 <_entry-0x7fffffff>

在 gdb 裡面，嘗試在這個 address 下斷點，最後證實是相同的 instruction

(gdb) b *0x8000283c
Breakpoint 1 at 0x8000283c: file kernel/syscall.c, line 138.
(gdb) target remote localhost:26544
(gdb) c
Continuing.
Thread 1 hit Breakpoint 1, 0x000000008000283c in syscall () at kernel/syscall.c:138
138       num = *(int *) 1;
(gdb) layout asm

│   0x80002838 <syscall+14> mv      s1,a0
│   0x8000283a <syscall+16> li      a3,1
│B+>0x8000283c <syscall+18> lbu     a4,1(zero) # 0x1
│   0x80002840 <syscall+22> lbu     a5,1(a3)

作業 10

目標 :
Confirm that the faulting assembly instruction is the same as the one you found above.

作業 9 已經證實，kernel/kernel.asm 所查到的 instruction，跟 gdb 得到的 faulting instruction 是相同的。

作業 11

目標 :
Why does the kernel crash? Hint: look at figure 3-3 in the text; is address 0 mapped in the kernel address space? Is that confirmed by the value in scause above? (See description of scause in RISC-V privileged instructions)

根據 xv6-riscv book ( https://pdos.csail.mit.edu/6.1810/2025/xv6/book-riscv-rev5.pdf ) 的 figure 3.3，我們可以知道，0x0 ~ 0x1000 在 kernel-root-page-table 上並沒有映射。
所以當 load 資料的 virtual address 為 0x0~0x1000 這個範圍的時候，會發生 load-page-fault。

作業 12

目標 :
Note that scause was printed by the kernel panic above, but often you need to look at additional info to track down the problem that caused the panic. For example, to find out which user process was running when the kernel paniced, you can print the process's name:

   (gdb) p p->name

先下斷點在即將發生 panic 的地方。

(gdb) b *0x8000283c
Breakpoint 1 at 0x8000283c: file kernel/syscall.c, line 138.
(gdb) target remote localhost:26544
Remote debugging using localhost:26544
0x0000000000001000 in ?? ()
(gdb) c
Continuing.
Thread 2 hit Breakpoint 1, 0x000000008000283c in syscall () at kernel/syscall.c:138
138       num = *(int *) 1;

撞到斷點並停下來之後，因為剛好這邊有個區域變數 p 指向 struct-proc，所以可以直接去 print 它的 name。這邊可以知道，這個 user process 便是執行 init 的 process。

(gdb) p p->name
$2 = "init", '\000' <repeats 11 times>

作業 13

目標 :
What is the name of the process that was running when the kernel paniced? What is its process id (pid)?

從作業 12 可以得知，name : “init”
根據 kernel/proc.h 的 struct proc，我們可以知道，pid 會是 struct-proc 的欄位，所以我們可以在這邊直接 print 出該 process 的 pid

(gdb) p p->pid
$4 = 1

Reference

作業連結 : https://pdos.csail.mit.edu/6.1810/2025/labs/syscall.html

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

79 組

團體組數

2 組

累計文章數

83 篇

最後報名日

9/15

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

ChatGPT Business & Codex 如何從零開始?

IT邦幫忙