YOLO作為一種基于圖像全局信息進(jìn)行預(yù)測的目標(biāo)檢測系統(tǒng),始終保持著極高的迭代更新率,從YOLOv5到Y(jié)OLOv8,本次升級主要包括結(jié)構(gòu)算法、命令行界面、Python API等。具體到Y(jié)OLOv8,它可以在大型數(shù)據(jù)集上進(jìn)行訓(xùn)練,并且能夠在各種硬件平臺上運(yùn)行;YOLOv8還有一個關(guān)鍵特性是它的可擴(kuò)展性,由于其被設(shè)計(jì)成一個框架,支持所有以前YOLO的版本,使得在不同版本之間切換和比較它們的性能變得容易。
本次內(nèi)容《目標(biāo)檢測算法再升級!YOLOv8保姆級教程一鍵體驗(yàn)》,地平線開發(fā)者社區(qū)優(yōu)秀開發(fā)者林松將會一步步引導(dǎo)大家在地平線旭日?X3派(下文簡稱旭日X3派)成功部署YOLOv8目標(biāo)檢測模型,并附上精度速度初探!相關(guān)問題歡迎大家注冊加入地平線開發(fā)者社區(qū)交流討論,配置文件及代碼詳見地平線開發(fā)者社區(qū)。
環(huán)境配置
本文所使用的腳本和代碼目錄結(jié)構(gòu)和說明如下:
├── project # X3 工作目錄 │ ├── calib_f32 # 量化校準(zhǔn)數(shù)據(jù)集 │ ├── coco128 # 量化校準(zhǔn)和待檢測圖片 │ ├── config.yaml # onnx 轉(zhuǎn) bin 模型配置 │ ├── modules.py -> ../ultralytics/ultralytics/nn/modules.py # 軟鏈接 YOLOv8 后處理文件 │ ├── onnxruntime-infer.py # pc 端讀取 onnx 并檢測 │ ├── requirements.txt # python 依賴包 │ ├── step1_export_onnx.py # YOLOv8 ONNX 導(dǎo)出 │ ├── step2_make_calib.py # 制作量化校準(zhǔn)數(shù)據(jù)集 │ ├── step3_convert_bin.sh # onnx 轉(zhuǎn) bin 腳本 │ ├── step4_inference.py # X3 推理代碼 │ ├── yolo-comparison-plots.png # YOLO 模型對比圖 │ ├── yolov8n.onnx # 轉(zhuǎn)換好的 onnx │ ├── yolov8n.pt # YOLOv8 pytorch 權(quán)重 │ └── yolov8n_horizon.bin # 轉(zhuǎn)換好的 bin 模型 ├── ultralytics # YOLOv8 倉庫 │ ├── CITATION.cff │ ├── CONTRIBUTING.md │ ├── LICENSE │ ├── MANIFEST.in │ ├── README.md │ ├── README.zh-CN.md │ ├── docker │ ├── docs │ ├── examples │ ├── mkdocs.yml │ ├── requirements.txt │ ├── setup.cfg │ ├── setup.py │ ├── tests │ └── ultralytics
YOLOv8 PyTorch環(huán)境配置
請?jiān)陂_發(fā)機(jī)中導(dǎo)出ONNX模型,安裝PyTorch ONNX等依賴,再安裝YOLOv8:
cd project python3 -m pip install -r requirements.txt cd ../ultralytics python3 setup.py install cd ../project
模型導(dǎo)出
修改YOLOv8后處理代碼
將YOLOv8中ultralytics/ultralytics/nn/modules.py軟鏈接到 project/modules.py,方便定位到修改的代碼位置,其中中有兩個trick:
# *************************************************************************************************************** # # *************************************************************************************************************** # # 原倉庫的版本帶后處理 注釋掉?。。。?# def forward(self, x): # shape = x[0].shape # BCHW # for i in range(self.nl): # x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1) # if self.training: # return x # elif self.dynamic or self.shape != shape: # self.anchors, self.strides = (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5)) # self.shape = shape # # box, cls = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2).split((self.reg_max * 4, self.nc), 1) # dbox = dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywh=True, dim=1) * self.strides # y = torch.cat((dbox, cls.sigmoid()), 1) # return y if self.export else (y, x) # *************************************************************************************************************** # # *************************************************************************************************************** # # X3 部署使用的版本?。。?! def forward(self, x): res = [] for i in range(self.nl): bboxes = self.cv2[i](x[i]).permute(0, 2, 3, 1) scores = self.cv3[i](x[i]).permute(0, 2, 3, 1) res.append(bboxes) res.append(scores) # 返回 tuple 不會導(dǎo)出報(bào)錯 return tuple(res) # *************************************************************************************************************** # # *************************************************************************************************************** #
- 導(dǎo)出Transpose(permute)節(jié)點(diǎn)
bboxes = self.cv2[i](x[i]).permute(0, 2, 3, 1) scores = self.cv3[i](x[i]).permute(0, 2, 3, 1)
由于旭日X3派支持的模型格式為NHWC,但是PyTorch訓(xùn)練的模型是NCHW,因此我們導(dǎo)出的ONNX模型在轉(zhuǎn)換bin時會在網(wǎng)絡(luò)頭和尾插入Transpose結(jié)點(diǎn),而這個 Transpose節(jié)點(diǎn)的順序是[0, 3, 1, 2],可以發(fā)現(xiàn)與我們插入的[0, 2, 3, 1]節(jié)點(diǎn)正好可以抵消,相當(dāng)與少了個Transpose節(jié)點(diǎn),這樣是可以提升模型推理速度,避免不必要的計(jì)算的。
- 將輸出處理成 tuple
這步主要是為了讓YOLOv8能夠順利導(dǎo)出不報(bào)錯,如果使用list則會報(bào)tulpe的錯誤。
使用YOLOv8導(dǎo)出的ONNX
執(zhí)行 step1_export_onnx.py,可以下載官方的權(quán)重并導(dǎo)出 ONNX。
# 導(dǎo)入 YOLOv8 from ultralytics import YOLO # 載入預(yù)訓(xùn)練權(quán)重 model = YOLO("yolov8n.pt") # 指定 opset=11 并且使用 onnx-sim 簡化 ONNX success = model.export(format="onnx", opset=11, simplify=True)
python3 step1_export_onnx.py
注意:旭日X3派支持ONNX opset = 10/11,其他版本會無法通過模型工具鏈編譯。
使用ONNXRuntime推理導(dǎo)出ONNX
為了避免導(dǎo)出的ONNX出錯,最好使用ONNXRuntime來驗(yàn)證一下模型的正確性。
def letterbox(im, new_shape=(640, 640), color=114): # Resize and pad image while meeting stride-multiple constraints shape = im.shape[:2]
# current shape [height, width] if isinstance(new_shape, int): new_shape = (new_shape, new_shape) # Scale ratio (new / old) r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) # Compute padding new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]
# wh padding dw /= 2 # divide padding into 2 sides dh /= 2 if shape[::-1] != new_unpad: # resize im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(color, color, color)) # add border return im, 1 / r, (dw, dh) def ratioresize(im, new_shape=(640, 640), color=114): shape = im.shape[:2]
# current shape [height, width] if isinstance(new_shape, int): new_shape = (new_shape, new_shape) new_h, new_w = new_shape padded_img = np.ones((new_h, new_w, 3), dtype=np.uint8) * color # Scale ratio (new / old) r = min(new_h / shape[0], new_w / shape[1])
# Compute padding new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) if shape[::-1] != new_unpad: im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR) padded_img[: new_unpad[1], : new_unpad[0]] = im padded_img = np.ascontiguousarray(padded_img) return padded_img, 1 / r, (0, 0)
本文使用的兩種圖像縮放方法,letterbox是YOLOv8中訓(xùn)練時啟用的方法,由于需要四周padding并且后處理要根據(jù)padding的數(shù)值還原,較為麻煩。使用 ratioresize方法,在保持圖像的長寬比例的同時,使用右下角padding避免了后處理計(jì)算偏移量。
if __name__ == '__main__': images_path = Path('./coco128') model_path = Path('./yolov8n.onnx') score_thres = 0.4 iou_thres = 0.65 num_classes = 80 try: session = onnxruntime.InferenceSession(str(model_path), providers=['CPUExecutionProvider']) model_h, model_w = session.get_inputs()[0].shape[2:] except Exception as e: print(f'Load model error.\n{e}') exit() else: try: # 預(yù)熱10次推理 for _ in range(10): session.run(None, {'images': np.random.randn(1, 3, model_h, model_w).astype(np.float32)}) except Exception as e: print(f'Warm up model error.\n{e}') cv2.namedWindow("results", cv2.WINDOW_AUTOSIZE) for img_path in images_path.iterdir(): image = cv2.imread(str(img_path)) t0 = time.perf_counter()
## yolov8 training letterbox # resized, ratio, (dw, dh) = letterbox(image, (model_h, model_w)) resized, ratio, (dw, dh) = ratioresize(image, (model_h, model_w)) buffer = blob(resized) t1 = time.perf_counter() outputs = session.run(None, {'images': buffer}) outputs = [o[0] for o in outputs] t2 = time.perf_counter() results = postprocess( outputs, score_thres, iou_thres, image.shape[0], image.shape[1], dh, dw, ratio, ratio, 16, num_classes) results = nms(*results) t3 = time.perf_counter() for (x0, y0, x1, y1, score, label) in results: x0, y0, x1, y1 = map(int, [x0, y0, x1, y1]) cls_id = int(label) cls = CLASSES[cls_id] color = COLORS[cls] cv2.rectangle(image, [x0, y0], [x1, y1], color, 1) cv2.putText(image, f'{cls}:{score:.3f}', (x0, y0 - 2), cv2.FONT_HERSHEY_SIMPLEX, 0.325, [0, 0, 225], thickness=1) t4 = time.perf_counter() cv2.imshow('results', image)
上述是推理主函數(shù),為了保證模型打印的耗時穩(wěn)定,前期啟動了10次推理預(yù)熱,建議端側(cè)部署時一定記得預(yù)熱一下??梢钥吹浇Y(jié)果正確,后處理邏輯也是對的。
生成量化校準(zhǔn)數(shù)據(jù)集
執(zhí)行step2_make_calib.py,可以讀取coco128目錄下隨機(jī)50張圖片,制作校準(zhǔn)數(shù)據(jù)集。
img = cv2.imread(str(i)) img = letterbox(img)[0] img = blob(img[:, :, ::-1]) # bgr -> rgb print(img.shape) img.astype(np.float32).tofile(str(save / (i.stem + '.rgbchw')))
制作校準(zhǔn)數(shù)據(jù)集主要是讀圖-> resize -> uint8轉(zhuǎn)float -> numpy.tofile。在calib_f32目錄下會生成50個rgbchw結(jié)尾的文件:
python3 step2_make_calib.py
使用地平線提供的Docker編譯bin模型
將docker_openexplorer_centos_7_xj3_v2.4.2.tar.gz下載到本地開發(fā)機(jī),并使用以下命令開啟docker:
cd ../ wget -c ftp://vrftp.horizon.ai/Open_Explorer_gcc_9.3.0/2.4.2/docker_openexplorer_centos_7_xj3_v2.4.2.tar.gz docker load -i docker_openexplorer_centos_7_xj3_v2.4.2.tar.gz docker run -it --name horizonX3 -v ${PWD}/project:/open_explorer/project openexplorer/ai_toolchain_centos_7_xj3:v2.4.2 docker exec -it horizonX3 /bin/bash
進(jìn)入容器后,執(zhí)行:
cd project bash step3_convert_bin.sh
編譯成功后會打印如下日志:
/model.22/cv3.2/cv3.2.2/Conv BPU id(0) HzSQuantizedConv 0.998216 67.505043 2023-01-31 21:17:24,261 INFO [Tue Jan 31 21:17:24 2023] End to Horizon NN Model Convert. 2023-01-31 21:17:24,315 INFO start convert to *.bin file.... 2023-01-31 21:17:24,345 INFO ONNX model output num : 6 2023-01-31 21:17:24,346 INFO
############# model deps info ############# 2023-01-31 21:17:24,346 INFO hb_mapper version : 1.9.9 2023-01-31 21:17:24,346 INFO hbdk version : 3.37.2 2023-01-31 21:17:24,346 INFO hbdk runtime version: 3.14.14 2023-01-31 21:17:24,346 INFO horizon_nn version : 0.14.0 2023-01-31 21:17:24,346 INFO
############# model_parameters info ############# 2023-01-31 21:17:24,346 INFO onnx_model : /open_explorer/workspace/yolov8/yolov8n.onnx 2023-01-31 21:17:24,346 INFO BPU march : bernoulli2 2023-01-31 21:17:24,346 INFO layer_out_dump : False 2023-01-31 21:17:24,346 INFO log_level : DEBUG 2023-01-31 21:17:24,346 INFO working dir : /open_explorer/workspace/yolov8/model_output 2023-01-31 21:17:24,346 INFO output_model_file_prefix: yolov8n_horizon 2023-01-31 21:17:24,347 INFO
############# input_parameters info ############# 2023-01-31 21:17:24,347 INFO ------------------------------------------
2023-01-31 21:17:24,347 INFO ---------input info : images --------- 2023-01-31 21:17:24,347 INFO input_name : images 2023-01-31 21:17:24,347 INFO input_type_rt : nv12 2023-01-31 21:17:24,347 INFO input_space&range : regular 2023-01-31 21:17:24,347 INFO input_layout_rt : NHWC 2023-01-31 21:17:24,347 INFO input_type_train : rgb 2023-01-31 21:17:24,347 INFO input_layout_train : NCHW 2023-01-31 21:17:24,347 INFO norm_type : data_scale 2023-01-31 21:17:24,347 INFO input_shape : 1x3x640x640 2023-01-31 21:17:24,347 INFO input_batch : 1 2023-01-31 21:17:24,347 INFO scale_value : 0.003921568627451, 2023-01-31 21:17:24,347 INFO cal_data_dir : /open_explorer/calib_f32 2023-01-31 21:17:24,347 INFO ---------input info : images end ------- 2023-01-31 21:17:24,347 INFO ------------------------------------------
2023-01-31 21:17:24,347 INFO ############# calibration_parameters info #############
2023-01-31 21:17:24,348 INFO preprocess_on : False 2023-01-31 21:17:24,348 INFO calibration_type: : max 2023-01-31 21:17:24,348 INFO cal_data_type : float32 2023-01-31 21:17:24,348 INFO max_percentile : 0.99999 2023-01-31 21:17:24,348 INFO per_channel : True 2023-01-31 21:17:24,348 INFO ############# compiler_parameters info ############# 2023-01-31 21:17:24,348 INFO hbdk_pass_through_params: --core-num 2 --fast --O3 2023-01-31 21:17:24,348 INFO input-source : {'images': 'pyramid', '_default_value': 'ddr'} 2023-01-31 21:17:24,354 INFO Convert to runtime bin file sucessfully! 2023-01-31 21:17:24,354 INFO End Model Convert /model.22/cv3.2/cv3.2.2/Conv BPU id(0) HzSQuantizedConv 0.998216 67.505043
上文的0.998216表示量化后的模型最后一層輸出的余弦相似度,越接近1代表模型精度保持的越高(PS.model_output/yolov8n_horizon.bin是轉(zhuǎn)換完的bin模型)。
推理測試
上板測試
將project文件夾打包上傳到旭日X3派中,可以使用ssh或者U盤復(fù)制到旭日X3派工作目錄中。假設(shè)保存到入/home/sunrise/project,推理前處理需要將輸入轉(zhuǎn)換到 nv12:
def bgr2nv12_opencv(image): height, width = image.shape[:2] area = height * width yuv420p = cv2.cvtColor(image, cv2.COLOR_BGR2YUV_I420).reshape((area * 3 // 2,)) y = yuv420p[:area] uv_planar = yuv420p[area:].reshape((2, area // 4)) uv_packed = uv_planar.transpose((1, 0)).reshape((area // 2,)) nv12 = np.zeros_like(yuv420p) nv12[:area] = y nv12[area:] = uv_packed return nv12
使用終端執(zhí)行:
cd /home/sunrise/project sudo python3 -m pip install opencv-python # 安裝 X3 推理依賴 mv model_output/yolov8n_horizon.bin ./ sudo python3 step4_inference.py
會看到圖片檢測并繪制的結(jié)果,還會打印推理的耗時情況:
得出結(jié)果:
前處理:30.4ms
推理:168.5ms
后處理:66ms
畫圖:0.8ms
全程耗時:265.9ms
本文轉(zhuǎn)自地平線開發(fā)者社區(qū)
原作者:tripleMu
-
嵌入式
+關(guān)注
關(guān)注
5094文章
19184瀏覽量
307849 -
檢測
+關(guān)注
關(guān)注
5文章
4514瀏覽量
91766 -
目標(biāo)檢測
+關(guān)注
關(guān)注
0文章
211瀏覽量
15667
發(fā)布評論請先 登錄
相關(guān)推薦
評論