iT邦幫忙

1

[Yolov8-TensorRT] - 5. Yolov8+TensorRT加速

  • 分享至 

  • xImage
  •  
  1. yolov8 與 yolov8 + tensorRT 比較 (使用pt 轉 onnx 轉 engine)
from ultralytics import YOLO
import os

os.chdir(os.path.dirname(os.path.abspath(__file__)))

model = YOLO("./Weights/yolov8n.pt", task="detect")
result = model.predict("./Image/bus.jpg", save=True)

model = YOLO("./Weights/yolov8n.engine", task="detect")
result = model.predict("./Image/bus.jpg", save=True)
yolov8
image 1/1 d:\05.Python\yolov8_tensorRT\Image\bus.jpg: 640x480 4 persons, 1 bus, 1 stop sign, 44.9ms
Speed: 2.0ms preprocess, 44.9ms inference, 53.4ms postprocess per image at shape (1, 3, 640, 480)


yolov8 + tensorRT
Loading Weights\yolov8n.engine for TensorRT inference...
[11/09/2024-18:59:07] [TRT] [I] Loaded engine size: 9 MiB
[11/09/2024-18:59:07] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +12, now: CPU 0, GPU 18 
(MiB)
WARNING ⚠️ Metadata not found for 'model=./Weights/yolov8n.engine'

image 1/1 d:\05.Python\yolov8_tensorRT\Image\bus.jpg: 640x640 4 class0s, 1 class5, 2.0ms
Speed: 2.0ms preprocess, 2.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)
  • yolov8
    https://ithelp.ithome.com.tw/upload/images/20241214/2013473699xVqtAPWP.jpg
  • yolov8 + tensorRT
    https://ithelp.ithome.com.tw/upload/images/20241214/20134736usi1hUWt38.jpg
  1. yolov8 + tensorRT 比較 fp32 fp16 int8 (使用pt 轉 engine)
from ultralytics import YOLO
import os

os.chdir(os.path.dirname(os.path.abspath(__file__)))

print("pt to engine fp32:\n")
model = YOLO("./Weights/yolov8n_fp32.engine", task="detect")
result = model.predict("./Image/bus.jpg", save=True)

print("pt to engine fp16:\n")
model = YOLO("./Weights/yolov8n_fp16.engine", task="detect")
result = model.predict("./Image/bus.jpg", save=True)

print("pt to engine int8:\n")
model = YOLO("./Weights/yolov8n_int8.engine", task="detect")
result = model.predict("./Image/bus.jpg", save=True)
pt to engine fp32:
Loading Weights\yolov8n_fp32.engine for TensorRT inference...
[11/09/2024-21:22:58] [TRT] [I] Loaded engine size: 20 MiB
[11/09/2024-21:22:58] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +18, now: CPU 0, GPU 34 (MiB)

image 1/1 d:\05.Python\yolov8_tensorRT\Image\bus.jpg: 640x640 4 persons, 1 bus, 3.0ms
Speed: 2.7ms preprocess, 3.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)

pt to engine fp16:
Loading Weights\yolov8n_fp16.engine for TensorRT inference...
[11/09/2024-21:22:58] [TRT] [I] The logger passed into createInferRuntime differs 
from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
[11/09/2024-21:22:58] [TRT] [I] Loaded engine size: 9 MiB
[11/09/2024-21:22:58] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +13, now: CPU 1, GPU 54 (MiB)

image 1/1 d:\05.Python\yolov8_tensorRT\Image\bus.jpg: 640x640 4 persons, 1 bus, 2.0ms
Speed: 1.5ms preprocess, 2.0ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 640)

pt to engine int8:
Loading Weights\yolov8n_int8.engine for TensorRT inference...
[11/09/2024-21:22:58] [TRT] [I] The logger passed into createInferRuntime differs from one a
one already provided for an existing builder, runtime, or refitter. Uses of the global ogrre
logger, returned by nvinfer1::getLogger(), will return the existing value.
[11/09/2024-21:22:58] [TRT] [I] Loaded engine size: 7 MiB
[11/09/2024-21:22:58] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionCoionContext creation: CPU +0, GPU +2500, now: CPU 1, GPU 2557 (MiB)

image 1/1 d:\05.Python\yolov8_tensorRT\Image\bus.jpg: 640x640 3 persons, 1 bus, 2.0ms       
Speed: 2.0ms preprocess, 2.0ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 640, 640)

https://ithelp.ithome.com.tw/upload/images/20241214/20134736DzKuwJcW9R.jpg
https://ithelp.ithome.com.tw/upload/images/20241214/20134736oyHPvDzltC.jpg

  • FP32:精度最佳,但推理時間較長,模型較大。
  • FP16:在推理時間和模型大小之間提供了良好的平衡。
  • INT8:模型最小,推理時間最快,但可能會有些精度損失,特別是在偵測結果上。

參考文件:
nvidia: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#python_topics
ultralytics: https://docs.ultralytics.com/integrations/tensorrt/#usage


圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

2 則留言

0
窮嘶發發發
iT邦高手 1 級 ‧ 2024-12-18 10:49:57

圖文抄襲沒有交代出處,只能封鎖了

1
franx0722
iT邦新手 2 級 ‧ 2024-12-19 21:18:26

https://ithelp.ithome.com.tw/upload/images/20241219/20134736HpwQyX7zpR.png
https://ithelp.ithome.com.tw/upload/images/20241219/20134736kQBdrfKJ6m.png
我自己抄自己可以嗎?

我要留言

立即登入留言