系列文章 : [6.1810] 跟著 MIT 6.1810 學習基礎作業系統觀念
syscall 這個 function 下斷點。並且嘗試使用 layout src,以及 backtrace
(gdb) b syscall
Breakpoint 1 at 0x80002142: file kernel/syscall.c, line 243.
(gdb) c
Continuing.
[Switching to Thread 1.2]
Thread 2 hit Breakpoint 1, syscall () at kernel/syscall.c:243
243 {
(gdb) layout src
(gdb) backtrace
struct proc *p = myproc(); Once past this statement, type p /x *p, which prints the current process's proc struct (see kernel/proc.h>) in hex.p->trapframe->a7 and what does that value represent? (Hint: look at user/init.c, the first user program xv6 starts, and its compiled assembly user/init.asm.)sstatus.(gdb) p /x $sstatus
num = p->trapframe->a7; with num = * (int *) 0; at the beginning of syscall, run make qemu, and you will see something similar to:xv6 kernel is booting
hart 2 starting
hart 1 starting
scause=0xd sepc=0x80001bfe stval=0x0
panic: kerneltrap
sepc value printed for the panic you just saw in the file kernel/kernel.asm, which contains the assembly for the compiled kernel. Write down the assembly instruction the kernel is panicing at. Which register corresponds to the variable num?(gdb) b *0x80001bfe
Breakpoint 1 at 0x80001bfe: file kernel/syscall.c, line 138.
(gdb) layout asm
(gdb) c
Continuing.
[Switching to Thread 1.3]
Thread 3 hit Breakpoint 1, syscall () at kernel/syscall.c:138
(gdb) p p->name
作業目標 :
使用 gdb 來對 xv6-riscv 的 kernel 下斷點。這邊的範例是對 syscall 這個 function 下斷點。並且嘗試使用 layout src,以及 backtrace
可以先安裝 gdb
sudo apt-get install gdb-multiarch
然後就可以在 xv6-riscv 的資料夾底下,用 make qemu-gdb
*** Now run 'gdb' in another window.
qemu-system-riscv64 -machine virt -bios none -kernel kernel/kernel -m 128M -smp 3 -nographic -global virtio-mmio.force-legacy=false -drive file=fs.img,if=none,format=raw,id=x0 -device virtio-blk-device,drive=x0,bus=virtio-mmio-bus.0 -S -gdb tcp::26544
continue 指令。之後我們就可以使用 gdb 連進 QEMU
gdb-multiarch kernel/kernel
(gdb) target remote localhost:26544
Remote debugging using localhost:26544
0x0000000000001000 in ?? ()
這時候可以看到,我們的 PC 值停在 0x1000 的地方
這時候我們可以完成作業的要求。
(gdb) b syscall
Breakpoint 1 at 0x8000282a: file kernel/syscall.c, line 133.
(gdb) c
Continuing.
Thread 1 hit Breakpoint 1, syscall () at kernel/syscall.c:133
133 {
(gdb) layout src
(gdb) backtrace
#0 syscall () at kernel/syscall.c:133
#1 0x00000000800025f0 in usertrap () at kernel/trap.c:68
#2 0x0000003ffffff09c in ?? ()
(gdb) bt
#0 syscall () at kernel/syscall.c:133
#1 0x00000000800025f0 in usertrap () at kernel/trap.c:68
#2 0x0000003ffffff09c in ?? ()
目標 : Looking at the backtrace output, which function called syscall?
從上面的 output 可以看到,是 usertrap 這個 function 去呼叫了 syscall。
而從 source code 可以看到,確實 usertrap ( kernel/trap.c ) 會去呼叫這個 function。
目標 : Type n a few times to step past struct proc *p = myproc(); Once past this statement, type p /x *p, which prints the current process's proc struct (see kernel/proc.h) in hex.
用 gdb 指令 n,讓程式碼走到 struct proc *p = myproc() 以下的時候,再用 p /x *p
(gdb) p /x *p
$1 = {lock = {locked = 0x0, name = 0x80007178, cpu = 0x0}, state = 0x4, chan = 0x0, killed = 0x0, xstate = 0x0, pid = 0x1, parent = 0x0,
kstack = 0x3fffffd000, sz = 0x4000, pagetable = 0x87f52000, trapframe = 0x87f56000, context = {ra = 0x80001e52, sp = 0x3fffffdc50,
s0 = 0x3fffffdc80, s1 = 0x8000fd88, s2 = 0x8000f958, s3 = 0x0, s4 = 0x3, s5 = 0x80020a28, s6 = 0x0, s7 = 0x180, s8 = 0xffffffffffffffff,
s9 = 0x400, s10 = 0x2, s11 = 0x38}, ofile = {0x0 <repeats 16 times>}, cwd = 0x8001de98, name = {0x69, 0x6e, 0x69, 0x74,
0x0 <repeats 12 times>}}
目標 : What is the value of p->trapframe->a7 and what does that value represent? (Hint: look at user/init.c, the first user program xv6 starts, and its compiled assembly user/init.asm.)
p->trapframe->a7?(gdb) p p->trapframe->a7
$3 = 15
user/usys.S 可以知道, a7 的意思是,現在想要執行哪一個 SYS_callkernel/syscall.h 可以知道 a7 == 15,代表的值是 SYS_open,表示現在我們想要呼叫的 syscall,是 kernel/sysfile.c/sys_open
user/init.asm,這邊在呼叫 open 的時候,會 li a7, 15
00000000000003c4 <open>:
.global open
open:
li a7, SYS_open
3c4: 48bd li a7,15
ecall
3c6: 00000073 ecall
ret
3ca: 8082 ret
目標 : The processor is running in supervisor mode, and we can print privileged registers such as sstatus.
(gdb) p/x $sstatus
$5 = 0x200000022
目標 : What was the previous mode that the CPU was in?
目標 :
The xv6 kernel code contains consistency checks whose failure causes the kernel to panic; you may find that your kernel modifications cause panics. For example, replace the statement num = p->trapframe->a7; with num = * (int *) 0; at the beginning of syscall, run make qemu, and you will see something similar to:
xv6 kernel is booting
hart 2 starting
hart 1 starting
scause=0xd sepc=0x80001bfe stval=0x0
panic: kerneltrap
xv6 kernel is booting
hart 1 starting
hart 2 starting
scause=0xd sepc=0x8000283a stval=0x0
panic: kerneltrap
num = * (int *) 0 改成 num = * (int *) 1 的話,stval 會是 1。目標 :
To track down the source of a kernel page-fault panic, search for the sepc value printed for the panic you just saw in the file kernel/kernel.asm, which contains the assembly for the compiled kernel. Write down the assembly instruction the kernel is panicing at. Which register corresponds to the variable num?
0x8000283a,每個人的環境可能多多少少有些不同 ),在 kernel/kernel.asm 裡面,找出出問題的 instruction 是 lw a3,0(zero)
variable num 的值會反應在 stval
目標 :
To inspect the state of the processor and the kernel at the faulting instruction, fire up gdb, and set a breakpoint at the faulting epc, like this:
(gdb) b *0x80001bfe
Breakpoint 1 at 0x80001bfe: file kernel/syscall.c, line 138.
(gdb) layout asm
(gdb) c
Continuing.
[Switching to Thread 1.3]
Thread 3 hit Breakpoint 1, syscall () at kernel/syscall.c:138
make qemu ,拿到最新的 sepcxv6 kernel is booting
hart 1 starting
hart 2 starting
scause=0xd sepc=0x8000283c stval=0x1
panic: kerneltrap
8000283c: 00104703 lbu a4,1(zero) # 1 <_entry-0x7fffffff>
(gdb) b *0x8000283c
Breakpoint 1 at 0x8000283c: file kernel/syscall.c, line 138.
(gdb) target remote localhost:26544
(gdb) c
Continuing.
Thread 1 hit Breakpoint 1, 0x000000008000283c in syscall () at kernel/syscall.c:138
138 num = *(int *) 1;
(gdb) layout asm
│ 0x80002838 <syscall+14> mv s1,a0
│ 0x8000283a <syscall+16> li a3,1
│B+>0x8000283c <syscall+18> lbu a4,1(zero) # 0x1
│ 0x80002840 <syscall+22> lbu a5,1(a3)
目標 :
Confirm that the faulting assembly instruction is the same as the one you found above.
作業 9 已經證實,kernel/kernel.asm 所查到的 instruction,跟 gdb 得到的 faulting instruction 是相同的。
目標 :
Why does the kernel crash? Hint: look at figure 3-3 in the text; is address 0 mapped in the kernel address space? Is that confirmed by the value in scause above? (See description of scause in RISC-V privileged instructions)
目標 :
Note that scause was printed by the kernel panic above, but often you need to look at additional info to track down the problem that caused the panic. For example, to find out which user process was running when the kernel paniced, you can print the process's name:
(gdb) p p->name
(gdb) b *0x8000283c
Breakpoint 1 at 0x8000283c: file kernel/syscall.c, line 138.
(gdb) target remote localhost:26544
Remote debugging using localhost:26544
0x0000000000001000 in ?? ()
(gdb) c
Continuing.
Thread 2 hit Breakpoint 1, 0x000000008000283c in syscall () at kernel/syscall.c:138
138 num = *(int *) 1;
p 指向 struct-proc,所以可以直接去 print 它的 name。這邊可以知道,這個 user process 便是執行 init 的 process。(gdb) p p->name
$2 = "init", '\000' <repeats 11 times>
目標 :
What is the name of the process that was running when the kernel paniced? What is its process id (pid)?
kernel/proc.h 的 struct proc,我們可以知道,pid 會是 struct-proc 的欄位,所以我們可以在這邊直接 print 出該 process 的 pid(gdb) p p->pid
$4 = 1