[gem5][simple-rv-vp] 運行 dhrystone，並觀察不同 CPU model 的效能

os xv6

wtommy_fdgkhdkgh 2026-03-29 00:34:43 ‧ 53 瀏覽

分享至

這邊來嘗試在 gem5 上運行 dhrystone。並且嘗試看看，使用 in-order 的 CPU model ( minor ) 跟 out-of-order 的 CPU model ( O3 ) 會有怎麼樣的效能差異 ?

minor CPU model : https://github.com/gem5/gem5/tree/stable/src/cpu/minor
O3 CPU model : https://github.com/gem5/gem5/tree/stable/src/cpu/o3

使用的 dhrystone 版本是從 SiFive 的 repo 裡的 sifive/benchmark-dhrystone ( https://github.com/sifive/benchmark-dhrystone/tree/master )。

並且也參考了 T410N/dhrystone-rv32i-baremetal ( https://github.com/T410N/dhrystone-rv32i-baremetal/tree/master )，因為我也希望能在 baremetal 的環境下去執行 dhrystone。

為了執行 dhrystone，所以我做了一些 workaround，這邊就大略列舉一下。

https://github.com/TommyWu-fdgkhdkgh/simple-riscv-vp/blob/main/firmware/Makefile#L26

LIBS = -lgcc

假如沒有這個的話，會出現一大堆 link error。

undefined reference to __mulsi3
undefined reference to __floatsisf
undefined reference to __mulsf3

https://github.com/TommyWu-fdgkhdkgh/simple-riscv-vp/blob/main/firmware/dhrystone/dhrystone_main.c#L100-L106

原本這裡會需要用 malloc 來向 heap section 要求一段空間的，但因為我實在沒有能力自己在 baremetal 的環境寫記憶體的動態分配 ( 在這個年代，應該可以請 Gemini CLI 來代勞了 ! )，於是偷懶的用全域變數。

這邊用 mtime 來看時間的差異，而不是用 times。

https://github.com/TommyWu-fdgkhdkgh/simple-riscv-vp/blob/main/firmware/dhrystone/mini_lib.c#L5

在這邊從 ( https://github.com/T410N/dhrystone-rv32i-baremetal/blob/master/mini_libc.c#L64 ) 抓過來的程式碼，太感謝你了 !!

https://github.com/T410N/dhrystone-rv32i-baremetal/blob/master/strcmp.S

這邊是從 SiFive 的 repo 抓來的，感謝 SiFive !!

在這邊從 ( https://github.com/T410N/dhrystone-rv32i-baremetal/blob/master/Makefile#L26 ) 抓過來的程式碼，太感謝了 !!

終於可以把 dhrystone 給運行起來了，可以來試試看，不同的 CPU model，在運行 dhrystone 的效能上是不是也會有差異。

./gem5/build/RISCV/gem5.opt ./simple-riscv-vp.py --firmware ./firmware/build/simple.elf --cpu-type minor  --l1-icache

運行起 in-order 的 CPU model ( minor ) 後，可以看到簡易的選單，按下 d 之後就可以開始運行 dhrystone。

==== m5 terminal: Terminal 0 ====
start simple firmware!
============ menu ============
a : calculate a big number and insert mtime interrupt
m : set mtime cmp
M : test print mcycle
d : run dhrystone
T : test print mtime
==============================
run dhrystone !
============================
start to run `dhrystone_main`
============================

Dhrystone Benchmark, Version 2.1 (Language: C)

Program compiled without 'register' attribute

Execution starts, 6000 runs through Dhrystone

dhrystone 跑完之後，可以拿到 mtime 的差異。

end_mtime_low : 6367340
end_mtime_high : 0
begin_mtime_low : 469513
begin_mtime_high : 0
use_mtime_low : 5897827
use_mtime_high : 0

這個過程可以反覆多做幾次，來看看數字會不會有很大的浮動。在我的環境裡，數字都還蠻穩定的 !

再來可以用 out-of-order 的 CPU model ( O3 ) 來做一模一樣的實驗。

./gem5/build/RISCV/gem5.opt ./simple-riscv-vp.py --firmware ./firmware/build/simple.elf --cpu-type o3 --l1-icache

最後輸出的數字是

end_mtime_low : 842930
end_mtime_high : 0
begin_mtime_low : 97884
begin_mtime_high : 0
use_mtime_low : 745046
use_mtime_high : 0

實驗結果顯示 out-of-order 的 CPU 幾乎比 in-order 的 CPU 快了 8 倍!

in-order CPU model : 5897827
out-of-order CPU model : 745046
5897827 / 745046 = 7.916

這裡可以知道，可以利用像是 gem5 這樣的 timing model 來實驗說，新的硬體設計，新的 micro-architecture 設計，新的 pipeline 設計，能不能在同樣的 workload ( 這裡是以 dhrystone 為例 ) 有更好的效能。

我們也可以試著去調整 cache-size， branch predictor 的設計 ... 等等，來看看什麼樣的設計可以在同樣的 workload 下增進多少效能。

但是模擬器上加快了 8 倍，在真實的世界中，真的設計出了這樣的兩個硬體，真的可以快 8 倍嗎 ?
模擬器內產生的理論值跟真實世界硬體量出的實際值會不會有差異 ? 感覺都會是有趣的問題。

Reference

T410N/dhrystone-rv32i-baremetal
https://github.com/T410N/dhrystone-rv32i-baremetal/tree/master
sifive/benchmark-dhrystone
https://github.com/sifive/benchmark-dhrystone/blob/master/dhry.h
riscv-software-src/riscv-tests/dhrystone
https://github.com/riscv-software-src/riscv-tests/tree/master/benchmarks/dhrystone
CPU那些事儿 - CPU benchmarks(上)
https://juejin.cn/post/7534535266225750062#:~:text=CPU%E9%82%A3%E4%BA%9B%E4%BA%8B%E5%84%BF-%20CPU,%E3%80%81coremark%E3%80%81lmbench%20-%20%E6%8E%98%E9%87%91
测量处理器运算能力 dhrystone
http://yoc.docs.t-head.cn/icebook/Chapter2-%E5%8A%9F%E8%83%BD%E6%BC%94%E7%A4%BA/%E6%80%A7%E8%83%BD%E5%88%86%E6%9E%90%E5%B7%A5%E5%85%B7/1-dhrystone.html
痞子衡嵌入式：微处理器CPU性能测试基准(Dhrystone)
https://www.cnblogs.com/henjay724/p/10856831.html
使用 Dhrystone 评估 CPU 整数性能
https://doc.openvela.com/document?id=370&version=trunk&language=cn

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19832 篇

完賽人數

528 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

[gem5][simple-rv-vp] 運行 dhrystone，並觀察不同 CPU model 的效能

Reference

尚未有邦友留言

標記使用者