系列文章 : [gem5] 從零開始的 gem5 學習筆記
想用 gem5 輸出效能數據非常簡單,因為 gem5 運行結束之後,預期會在 m5out/stats.txt 列出效能數據。要注意,假如使用 gdb --args 去開啟的話,就不會輸出這個檔案了。
用 gem5.opt 去運行之前開發好的 python configuration file,並且運行簡易 firmware。
./gem5/build/RISCV/gem5.opt ./simple-riscv-vp.py --firmware ./firmware/build/simple.elf --cpu-type o3 --l1-icache
接著用 ctrl + c 停止模擬,預期可以在 m5out 這個資料夾底下看到 stats.txt 檔案。
這邊可以看到關於 CPU 的重要資訊,也就是 cycle per instruction ( 每一道指令需要多少 cycle ) 跟 instruction per cycle ( 每個 cycle 可以運行幾道指令 ) ,很明顯的他們互為倒數。
system.cpu.cpi 13.537585 # CPI: cycles per instruction (core level) ((Cycle/Count))
system.cpu.ipc 0.073868 # IPC: instructions per cycle (core level) ((Count/Cycle))
這邊能看到 branch predictor 相關的數據,例如說預測命中的機率。
system.cpu.branchPred.BTBLookups 67281 # Number of BTB lookups (Count)
system.cpu.branchPred.BTBUpdates 51 # Number of BTB updates (Count)
system.cpu.branchPred.BTBHits 57242 # Number of BTB hits (Count)
system.cpu.branchPred.BTBHitRatio 0.850790 # BTB Hit Ratio (Ratio)
system.cpu.branchPred.BTBMispredicted 43 # Number BTB mispredictions. No target found or target wrong (Count)
關於 icache,這邊可以看到它的命中次數,以及 miss 次數。
system.l1_icache.ReadReq.hits::cpu.inst 173166 # number of ReadReq hits (Count)
system.l1_icache.ReadReq.hits::total 173166 # number of ReadReq hits (Count)
system.l1_icache.ReadReq.misses::cpu.inst 54 # number of ReadReq misses (Count)
system.l1_icache.ReadReq.misses::total 54 # number of ReadReq misses (Count)
system.l1_icache.ReadReq.missLatency::cpu.inst 2386500 # number of ReadReq miss ticks (Tick)
system.l1_icache.ReadReq.missLatency::total 2386500 # number of ReadReq miss ticks (Tick)
system.l1_icache.ReadReq.accesses::cpu.inst 173220 # number of ReadReq accesses(hits+misses) (Count)
system.l1_icache.ReadReq.accesses::total 173220 # number of ReadReq accesses(hits+misses) (Count)
system.l1_icache.ReadReq.missRate::cpu.inst 0.000312 # miss rate for ReadReq accesses (Ratio)
system.l1_icache.ReadReq.missRate::total 0.000312 # miss rate for ReadReq accesses (Ratio)
因為我的 firmware 沒有開啟 paging,所以不會使用到 TLB,於是可以看到 TLB 相關的各項數據會輸出 0。看來要玩 TLB 的話,需要換一個會使用到 TLB 的軟體了。。。希望之後可以把 xv6-riscv 移植到 gem5 上運行 !! ( xv6-riscv 有使用到 paging )
system.cpu.mmu.dtb.readHits 0 # read hits (Count)
system.cpu.mmu.dtb.readMisses 0 # read misses (Count)
system.cpu.mmu.dtb.readAccesses 0 # read accesses (Count)
system.cpu.mmu.dtb.writeHits 0 # write hits (Count)
system.cpu.mmu.dtb.writeMisses 0 # write misses (Count)
system.cpu.mmu.dtb.writeAccesses 0 # write accesses (Count)
system.cpu.mmu.dtb.hits 0 # Total TLB (read and write) hits (Count)
system.cpu.mmu.dtb.misses 0 # Total TLB (read and write) misses (Count)
system.cpu.mmu.dtb.accesses 0 # Total TLB (read and write) accesses (Count)
system.cpu.mmu.dtb.walker.num_4kb_walks 0 # Completed page walks with 4KB pages (Count)
system.cpu.mmu.dtb.walker.num_64kb_walks 0 # Completed page walks with 64KB pages (Count)
system.cpu.mmu.dtb.walker.num_2mb_walks 0 # Completed page walks with 2MB pages (Count)
system.cpu.mmu.dtb.walker.power_state.pwrStateResidencyTicks::UNDEFINED 1971201000 # Cumulative time (in ticks) in various power states (Tick)
system.cpu.mmu.itb.readHits 0 # read hits (Count)
system.cpu.mmu.itb.readMisses 0 # read misses (Count)
system.cpu.mmu.itb.readAccesses 0 # read accesses (Count)
system.cpu.mmu.itb.writeHits 0 # write hits (Count)
system.cpu.mmu.itb.writeMisses 0 # write misses (Count)
system.cpu.mmu.itb.writeAccesses 0 # write accesses (Count)
system.cpu.mmu.itb.hits 0 # Total TLB (read and write) hits (Count)
system.cpu.mmu.itb.misses 0 # Total TLB (read and write) misses (Count)
system.cpu.mmu.itb.accesses 0 # Total TLB (read and write) accesses (Count)
system.cpu.mmu.itb.walker.num_4kb_walks 0 # Completed page walks with 4KB pages (Count)
system.cpu.mmu.itb.walker.num_64kb_walks 0 # Completed page walks with 64KB pages (Count)
system.cpu.mmu.itb.walker.num_2mb_walks 0 # Completed page walks with 2MB pages (Count)
但有個問題, 這個數據會把開機到現在所有的事件都計算進去,但其實我感興趣的部分可能只有 dhrystone 運行的時間而已,這時候該怎麼濾掉不感興趣的部分,只計算感興趣的部分 ( ROI, region of interest ) 呢 ?
希望有方法可以解決!接下來有空的話,可能會試試看用 M5ops 解決這個問題。