yolov5 on milk-v tpu 256 #13411

tcpipchip · 2024-11-11T12:25:13Z

Search before asking

I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

No response

Bug

Hi Sir,
Recently i got the MILK-V 256, a risc-v processor.
I followed these instructions to recognize objects
https://milkv.io/docs/duo/application-development/tpu/tpu-introduction
https://milkv.io/docs/duo/application-development/tpu/tpu-docker
https://milkv.io/docs/duo/application-development/tpu/tpu-yolov5
best.zip

And works very very very nice, using the YOLOV5 with the trainned https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5n.pt
But, when i create my pt on Colab, best.pt, and convert it to execute on MILK, i always get SEGMENT FAULT
train_data.zip
attached my train on COLAB. On COLAB works, i can do the inference.
Attached too the best.pt

Environment

Yolo5, docker, all requirements ok to yolov5 master

Minimal Reproducible Example

_import torch
from models.experimental import attempt_download
model = torch.load(attempt_download("./best.pt"),
map_location=torch.device('cpu'))['model'].float()
model.eval()
model.model[-1].export = True
torch.jit.trace(model, torch.rand(1, 3, 640, 640), strict=False).save('./yolov5n_jit.pt')_

model_transform.py \
--model_name yolov5n \
--model_def ./yolov5n_jit.pt \
--input_shapes [[1,3,640,640]] \
--pixel_format "rgb" \
--keep_aspect_ratio \
--mean 0,0,0 \
--scale 0.0039216,0.0039216,0.0039216 \
--test_input ./cat.jpg \
--test_result yolov5n_top_outputs.npz \
--mlir yolov5n.mlir

run_calibration.py yolov5n.mlir \
 --dataset ./train_data/train/images \
 --input_num 42 \
 -o ./yolov5n_cali_table

model_deploy.py \
 --mlir yolov5n.mlir \
 --quantize INT8 \
 --calibration_table ./yolov5n_cali_table \
 --chip cv181x \
 --test_input ./cat.jpg \
 --test_reference yolov5n_top_outputs.npz \
 --compare_all \
 --fuse_preprocess \
 --debug \
 --model yolov5n_int8_fuse.cvimodel

 ./samples/samples_extra/bin/cvi_sample_detector_yolo_v5_fused_preprocess \
 ./yolov5n_int8_fuse.cvimodel \
 ./face.jpg \
 ./yolov5n_out.jpg

SEGMENT FAULT

looks that my problem is on my best.pt, because the yolov5n.pt pre trainned works nice!

Additional

Sequence using the yolov5n.pt

all works fine

For more help on how to use Docker, head to https://docs.docker.com/go/guides/
ubuntu@DESKTOP-UHGFA4M:$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ubuntu@DESKTOP-UHGFA4M:$ docker run --privileged --name duotpu -v /workspace -it sophgo/tpuc_dev:v3.1
docker: Error response from daemon: Conflict. The container name "/duotpu" is already in use by container "2a46fc75400fa362ed00811b4ec34bba2612506d3938b0e72f8fabab41350246". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.
ubuntu@DESKTOP-UHGFA4M:$ docker run --privileged --name duotpu -v /workspace -it sophgo/tpuc_dev:v3.1
docker: Error response from daemon: Conflict. The container name "/duotpu" is already in use by container "2a46fc75400fa362ed00811b4ec34bba2612506d3938b0e72f8fabab41350246". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.
ubuntu@DESKTOP-UHGFA4M:$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2a46fc75400f sophgo/tpuc_dev:v3.1 "/bin/bash" 2 days ago Up 12 seconds duotpu
ubuntu@DESKTOP-UHGFA4M:~$ docker exec -it 2a46fc75400f /bin/bash
root@2a46fc75400f:/workspace# pytorch
bash: pytorch: command not found
root@2a46fc75400f:/workspace# ls
best.pt master tpu-mlir tpu-sdk yolov5-master yolov5n_torch
root@2a46fc75400f:/workspace# cd yolov5n_torch/
root@2a46fc75400f:/workspace/yolov5n_torch# ls
_weight_map.csv yolov5n_cv181x_int8_sym_final.mlir yolov5n_jit.pt
best.pt yolov5n_cv181x_int8_sym_model_outputs.npz yolov5n_origin.mlir
cat.jpg yolov5n_cv181x_int8_sym_tpu.mlir yolov5n_top_f32_all_origin_weight.npz
train_data yolov5n_cv181x_int8_sym_tpu_outputs.npz yolov5n_top_f32_all_weight.npz
train_data.zip yolov5n_in_f32.npz yolov5n_top_outputs.npz
work yolov5n_in_ori.npz yolov5n_tpu_addressed_cv181x_int8_sym_weight.npz
yolov5n.mlir yolov5n_int8_fuse.cvimodel yolov5n_tpu_addressed_cv181x_int8_sym_weight_fix.npz
yolov5n_cali_table yolov5n_int8_fuse_tensor_info.txt yolov5n_tpu_lowered_cv181x_int8_sym_weight.npz
root@2a46fc75400f:/workspace/yolov5n_torch# ls r*
ls: cannot access 'r*': No such file or directory
root@2a46fc75400f:/workspace/yolov5n_torch# cd ..
root@2a46fc75400f:/workspace# ls
best.pt master tpu-mlir tpu-sdk yolov5-master yolov5n_torch
root@2a46fc75400f:/workspace# cd yolov5-master/
root@2a46fc75400f:/workspace/yolov5-master# dir
CITATION.cff README.zh-CN.md data main.py segment val.py
CONTRIBUTING.md benchmarks.py detect.py models train.py yolov5n_jit.pt
LICENSE best.pt export.py pyproject.toml tutorial.ipynb
README.md classify hubconf.py requirements.txt utils
root@2a46fc75400f:/workspace/yolov5-master# cat requirements.txt

YOLOv5 requirements

Usage: pip install -r requirements.txt

Base ------------------------------------------------------------------------

gitpython>=3.1.30
matplotlib>=3.3
numpy>=1.23.5
opencv-python>=4.1.1
pillow>=10.3.0
psutil # system resources
PyYAML>=5.3.1
requests>=2.32.2
scipy>=1.4.1
thop>=0.1.1 # FLOPs computation
torch>=1.8.0 # see https://pytorch.org/get-started/locally (recommended)
torchvision>=0.9.0
tqdm>=4.66.3
ultralytics>=8.2.34 # https://ultralytics.com

protobuf<=3.20.1 # #8012

Logging ---------------------------------------------------------------------

tensorboard>=2.4.1

clearml>=1.2.0

comet

Plotting --------------------------------------------------------------------

pandas>=1.1.4
seaborn>=0.11.0

Export ----------------------------------------------------------------------

coremltools>=6.0 # CoreML export

onnx>=1.10.0 # ONNX export

onnx-simplifier>=0.4.1 # ONNX simplifier

nvidia-pyindex # TensorRT export

nvidia-tensorrt # TensorRT export

scikit-learn<=1.1.2 # CoreML quantization

tensorflow>=2.4.0,<=2.13.1 # TF exports (-cpu, -aarch64, -macos)

tensorflowjs>=3.9.0 # TF.js export

openvino-dev>=2023.0 # OpenVINO export

Deploy ----------------------------------------------------------------------

setuptools>=70.0.0 # Snyk vulnerability fix

tritonclient[all]~=2.24.0

Extras ----------------------------------------------------------------------

ipython # interactive notebook

mss # screenshots

albumentations>=1.0.3

pycocotools>=2.0.6 # COCO mAP

root@2a46fc75400f:/workspace/yolov5-master# nano requirements.txt
root@2a46fc75400f:/workspace/yolov5-master# pip install -r requirements.txt
Requirement already satisfied: gitpython>=3.1.30 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 5)) (3.1.32)
Requirement already satisfied: matplotlib>=3.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (3.7.2)
Requirement already satisfied: numpy>=1.23.5 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 7)) (1.24.3)
Requirement already satisfied: opencv-python>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 8)) (4.8.0.74)
Requirement already satisfied: pillow>=10.3.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (11.0.0)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 10)) (5.9.5)
Requirement already satisfied: PyYAML>=5.3.1 in /usr/lib/python3/dist-packages (from -r requirements.txt (line 11)) (5.4.1)
Requirement already satisfied: requests>=2.32.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 12)) (2.32.3)
Requirement already satisfied: scipy>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 13)) (1.11.1)
Requirement already satisfied: thop>=0.1.1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 14)) (0.1.1.post2209072238)
Requirement already satisfied: torch>=1.8.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 15)) (2.0.1+cpu)
Requirement already satisfied: torchvision>=0.9.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 16)) (0.15.2+cpu)
Requirement already satisfied: tqdm>=4.66.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 17)) (4.67.0)
Requirement already satisfied: ultralytics>=8.2.34 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 18)) (8.3.28)
Requirement already satisfied: pandas>=1.1.4 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 27)) (2.0.3)
Requirement already satisfied: seaborn>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 28)) (0.13.2)
Requirement already satisfied: setuptools>=70.0.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 42)) (75.3.0)
Requirement already satisfied: gitdb<5,>=4.0.1 in /usr/local/lib/python3.10/dist-packages (from gitpython>=3.1.30->-r requirements.txt (line 5)) (4.0.10)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (1.1.0)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (2.8.2)
Requirement already satisfied: pyparsing<3.1,>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (3.0.9)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (4.42.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (0.11.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (23.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->-r requirements.txt (line 12)) (3.4)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->-r requirements.txt (line 12)) (3.2.0)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->-r requirements.txt (line 12)) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->-r requirements.txt (line 12)) (2023.7.22)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.8.0->-r requirements.txt (line 15)) (1.12)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.8.0->-r requirements.txt (line 15)) (4.5.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.8.0->-r requirements.txt (line 15)) (3.1.2)
Requirement already satisfied: filelock in /usr/lib/python3/dist-packages (from torch>=1.8.0->-r requirements.txt (line 15)) (3.6.0)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.8.0->-r requirements.txt (line 15)) (3.1)
Requirement already satisfied: py-cpuinfo in /usr/local/lib/python3.10/dist-packages (from ultralytics>=8.2.34->-r requirements.txt (line 18)) (9.0.0)
Requirement already satisfied: ultralytics-thop>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics>=8.2.34->-r requirements.txt (line 18)) (2.0.11)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.1.4->-r requirements.txt (line 27)) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.1.4->-r requirements.txt (line 27)) (2023.3)
Requirement already satisfied: smmap<6,>=3.0.1 in /usr/local/lib/python3.10/dist-packages (from gitdb<5,>=4.0.1->gitpython>=3.1.30->-r requirements.txt (line 5)) (5.0.0)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib>=3.3->-r requirements.txt (line 6)) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.8.0->-r requirements.txt (line 15)) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.8.0->-r requirements.txt (line 15)) (1.3.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@2a46fc75400f:/workspace/yolov5-master# ls
CITATION.cff README.zh-CN.md data main.py segment val.py
CONTRIBUTING.md benchmarks.py detect.py models train.py yolov5n_jit.pt
LICENSE best.pt export.py pyproject.toml tutorial.ipynb
README.md classify hubconf.py requirements.txt utils
root@2a46fc75400f:/workspace/yolov5-master# nano main.py
root@2a46fc75400f:/workspace/yolov5-master#
root@2a46fc75400f:/workspace/yolov5-master#
root@2a46fc75400f:/workspace/yolov5-master#
root@2a46fc75400f:/workspace/yolov5-master# wget https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5n.pt
--2024-11-11 19:18:11-- https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5n.pt
Resolving github.com (github.com)... 20.201.28.151
Connecting to github.com (github.com)|20.201.28.151|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/3444cd1f-277c-414f-bdc9-3ac8ed6062df?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241111%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241111T111811Z&X-Amz-Expires=300&X-Amz-Signature=b7761184e059f5a596b94e432bf731d13dc16857dab233d44d18080fc0f23350&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dyolov5n.pt&response-content-type=application%2Foctet-stream [following]
--2024-11-11 19:18:11-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/3444cd1f-277c-414f-bdc9-3ac8ed6062df?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241111%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241111T111811Z&X-Amz-Expires=300&X-Amz-Signature=b7761184e059f5a596b94e432bf731d13dc16857dab233d44d18080fc0f23350&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dyolov5n.pt&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4062133 (3.9M) [application/octet-stream]
Saving to: ‘yolov5n.pt’

yolov5n.pt 100%[================================================>] 3.87M 8.31MB/s in 0.5s

2024-11-11 19:18:12 (8.31 MB/s) - ‘yolov5n.pt’ saved [4062133/4062133]

root@2a46fc75400f:/workspace/yolov5-master# cat main.py
import torch
from models.experimental import attempt_download
model = torch.load(attempt_download("./yolov5n.pt"),
map_location=torch.device('cpu'))['model'].float()
model.eval()
model.model[-1].export = True
torch.jit.trace(model, torch.rand(1, 3, 640, 640), strict=False).save('./yolov5n_jit.pt')
root@2a46fc75400f:/workspace/yolov5-master# python main.py
root@2a46fc75400f:/workspace/yolov5-master# cp yolov5n_jit.pt /workspace/yolov5-master/^C
root@2a46fc75400f:/workspace/yolov5-master# cd ..
root@2a46fc75400f:/workspace# cd yolov5n_torch
root@2a46fc75400f:/workspace/yolov5n_torch# cp /workspace/yolov5-master/yolov5n_jit.pt .
root@2a46fc75400f:/workspace/yolov5n_torch# source ./tpu-mlir/envsetup.sh
bash: ./tpu-mlir/envsetup.sh: No such file or directory
root@2a46fc75400f:/workspace/yolov5n_torch# cd ..
root@2a46fc75400f:/workspace# source ./tpu-mlir/envsetup.sh
root@2a46fc75400f:/workspace# cd yolov5n_torch/
root@2a46fc75400f:/workspace/yolov5n_torch# cp -rf ${TPUC_ROOT}/regression/dataset/COCO2017 .
root@2a46fc75400f:/workspace/yolov5n_torch# cp -rf ${TPUC_ROOT}/regression/image .
root@2a46fc75400f:/workspace/yolov5n_torch# model_transform.py \

--model_name yolov5n
--model_def ../yolov5n_jit.pt
--input_shapes [[1,3,640,640]]
--pixel_format "rgb"
--keep_aspect_ratio
--mean 0,0,0
--scale 0.0039216,0.0039216,0.0039216
--test_input ../image/dog.jpg
--test_result yolov5n_top_outputs.npz
--output_names 1219,1234,1249
--mlir yolov5n.mlir
SOPHGO Toolchain v1.3.228-g19ca95e9-20230921
2024/11/11 19:22:19 - INFO :
_____________________________________________________
| preprocess: |
| (x - mean) * scale |
'-------------------------------------------------------'
config Preprocess args :
resize_dims : same to net input dims
keep_aspect_ratio : True
keep_ratio_mode : letterbox
pad_value : 0
pad_type : center
--------------------------
mean : [0.0, 0.0, 0.0]
scale : [0.0039216, 0.0039216, 0.0039216]
--------------------------
pixel_format : rgb
channel_format : nchw

Traceback (most recent call last):
File "/workspace/tpu-mlir/python/tools/model_transform.py", line 272, in
tool = get_model_transform(args)
File "/workspace/tpu-mlir/python/tools/model_transform.py", line 232, in get_model_transform
tool = TorchTransformer(args.model_name, args.model_def, args.input_shapes,
File "/workspace/tpu-mlir/python/tools/model_transform.py", line 204, in init
self.converter = TorchConverter(self.model_name, self.model_def, input_shapes, input_types,
File "/workspace/tpu-mlir/python/transform/TorchConverter.py", line 55, in init
self.load_torch_model(torch_file, input_shapes, input_types, output_names)
File "/workspace/tpu-mlir/python/transform/TorchConverter.py", line 251, in load_torch_model
self.model = torch.jit.load(torch_file, map_location=torch.device('cpu'))
File "/usr/local/lib/python3.10/dist-packages/torch/jit/_serialization.py", line 152, in load
raise ValueError("The provided filename {} does not exist".format(f)) # type: ignore[str-bytes-safe]
ValueError: The provided filename ../yolov5n_jit.pt does not exist
root@2a46fc75400f:/workspace/yolov5n_torch# model_transform.py \

--model_name yolov5n
--model_def ./yolov5n_jit.pt
--input_shapes [[1,3,640,640]]
--pixel_format "rgb"
--keep_aspect_ratio

--sca> --mean 0,0,0
--scale 0.0039216,0.0039216,0.0039216
--test_input ./image/dog.jpg
--test_result yolov5n_top_outputs.npz
--output_names 1219,1234,1249
--mlir yolov5n.mlir
SOPHGO Toolchain v1.3.228-g19ca95e9-20230921
2024/11/11 19:23:07 - INFO :
_____________________________________________________
| preprocess: |
| (x - mean) * scale |
'-------------------------------------------------------'
config Preprocess args :
resize_dims : same to net input dims
keep_aspect_ratio : True
keep_ratio_mode : letterbox
pad_value : 0
pad_type : center
--------------------------
mean : [0.0, 0.0, 0.0]
scale : [0.0039216, 0.0039216, 0.0039216]
--------------------------
pixel_format : rgb
channel_format : nchw

Save mlir file: yolov5n_origin.mlir
[Running]: tpuc-opt yolov5n_origin.mlir --shape-infer --canonicalize --extra-optimize -o yolov5n.mlir
[Success]: tpuc-opt yolov5n_origin.mlir --shape-infer --canonicalize --extra-optimize -o yolov5n.mlir
Mlir file generated:yolov5n.mlir
2024/11/11 19:23:10 - INFO :
load_config Preprocess args :
resize_dims : [640, 640]
keep_aspect_ratio : True
keep_ratio_mode : letterbox
pad_value : 0
pad_type : center
input_dims : [640, 640]
--------------------------
mean : [0.0, 0.0, 0.0]
scale : [0.0039216, 0.0039216, 0.0039216]
--------------------------
pixel_format : rgb
channel_format : nchw

[CMD]: model_runner.py --input yolov5n_in_f32.npz --model ./yolov5n_jit.pt --output yolov5n_ref_outputs.npz
80: 100%|████████████████████████████████████████████████████████████████████████| 1230/1230 [00:01<00:00, 1134.76it/s]
Saving yolov5n_ref_outputs.npz
[CMD]: model_runner.py --input yolov5n_in_f32.npz --model yolov5n.mlir --output yolov5n_top_outputs.npz
[##################################################] 100%
Saving yolov5n_top_outputs.npz
[Running]: npz_tool.py compare yolov5n_top_outputs.npz yolov5n_ref_outputs.npz --tolerance 0.99,0.99 --except - -vv
compare 1249: 100%|█████████████████████████████████████████████████████████████████▋| 199/200 [00:06<00:00, 37.17it/s][x.1 ] EQUAL [PASSED]
(1, 3, 640, 640) float32
[input.62 ] SIMILAR [PASSED]
(1, 16, 320, 320) float32
cosine_similarity = 1.000000
euclidean_similarity = 0.999999
sqnr_similarity = 123.899231
[input.26 ] SIMILAR [PASSED]
(1, 16, 320, 320) float32
cosine_similarity = 1.000000
euclidean_similarity = 1.000000
sqnr_similarity = 127.972746
[103 ] SIMILAR [PASSED]
(1, 16, 320, 320) float32
cosine_similarity = 1.000000
euclidean_similarity = 1.000000
sqnr_similarity = 122.331476
[input.60 ] SIMILAR [PASSED]
...
(1, 255, 80, 80) float32
cosine_similarity = 1.000000
euclidean_similarity = 1.000000
sqnr_similarity = 119.547615
[1234 ] SIMILAR [PASSED]
(1, 255, 40, 40) float32
cosine_similarity = 1.000000
euclidean_similarity = 1.000000
sqnr_similarity = 118.485079
[1249 ] SIMILAR [PASSED]
(1, 255, 20, 20) float32
cosine_similarity = 1.000000
euclidean_similarity = 1.000000
sqnr_similarity = 118.825769
200 compared
200 passed
1 equal, 3 close, 196 similar
0 failed
0 not equal, 0 not similar
min_similiarity = (0.9999997615814209, 0.999998192529101, 114.64153289794922)
Target yolov5n_top_outputs.npz
Reference yolov5n_ref_outputs.npz
npz compare PASSED.
compare 1249: 100%|██████████████████████████████████████████████████████████████████| 200/200 [00:08<00:00, 24.98it/s]
[Success]: npz_tool.py compare yolov5n_top_outputs.npz yolov5n_ref_outputs.npz --tolerance 0.99,0.99 --except - -vv
root@2a46fc75400f:/workspace/yolov5n_torch# run_calibration.py yolov5n.mlir \

--dataset ../COCO2017
--input_num 100
-o ./yolov5n_cali_table
SOPHGO Toolchain v1.3.228-g19ca95e9-20230921
Traceback (most recent call last):
File "/workspace/tpu-mlir/python/tools/run_calibration.py", line 36, in
selector = DataSelector(args.dataset, args.input_num, args.data_list)
File "/workspace/tpu-mlir/python/calibration/data_selector.py", line 47, in init
raise RuntimeError("There is no inputs")
RuntimeError: There is no inputs
root@2a46fc75400f:/workspace/yolov5n_torch# run_calibration.py yolov5n.mlir
--dataset ./COCO2017
--input_num 100
-o ./yolov5n_cali_table
SOPHGO Toolchain v1.3.228-g19ca95e9-20230921
GmemAllocator use OpSizeOrderAssign
reused mem is 3276800, all mem is 43767600
2024/11/11 19:24:09 - INFO :
load_config Preprocess args :
resize_dims : [640, 640]
keep_aspect_ratio : True
keep_ratio_mode : letterbox
pad_value : 0
pad_type : center
input_dims : [640, 640]
--------------------------
mean : [0.0, 0.0, 0.0]
scale : [0.0039216, 0.0039216, 0.0039216]
--------------------------
pixel_format : rgb
channel_format : nchw

last input data (idx=100) not valid, droped
input_num = 100, ref = 100
real input_num = 100
activation_collect_and_calc_th for op: 1249: 100%|███████████████████████████████████| 200/200 [04:25<00:00, 1.33s/it]
[2048] threshold: 1249: 100%|███████████████████████████████████████████████████████| 200/200 [00:00<00:00, 235.10it/s]
GmemAllocator use OpSizeOrderAssign
reused mem is 3276800, all mem is 43767600
GmemAllocator use OpSizeOrderAssign
reused mem is 3276800, all mem is 43767600
prepare data from 100
tune op: 1249: 100%|█████████████████████████████████████████████████████████████████| 200/200 [07:13<00:00, 2.17s/it]
auto tune end, run time:433.61561346054077
root@2a46fc75400f:/workspace/yolov5n_torch# model_deploy.py

--qu> --mlir yolov5n.mlir \

--quantize INT8
--calibration_table ./yolov5n_cali_table
--chip cv181x
--test_input ./image/dog.jpg
--test_reference yolov5n_top_outputs.npz
--compare_all
--tolerance 0.96,0.72
--fuse_preprocess
--debug
--model yolov5n_int8_fuse.cvimodel
SOPHGO Toolchain v1.3.228-g19ca95e9-20230921
2024/11/11 19:37:39 - INFO :
load_config Preprocess args :
resize_dims : [640, 640]
keep_aspect_ratio : True
keep_ratio_mode : letterbox
pad_value : 0
pad_type : center
input_dims : [640, 640]
--------------------------
mean : [0.0, 0.0, 0.0]
scale : [0.0039216, 0.0039216, 0.0039216]
--------------------------
pixel_format : rgb
channel_format : nchw

Add preprocess, set the following params:
2024/11/11 19:37:39 - INFO :
_____________________________________________________
| preprocess: |
| (x - mean) * scale |
'-------------------------------------------------------'
config Preprocess args :
resize_dims : [640, 640]
keep_aspect_ratio : True
keep_ratio_mode : letterbox
pad_value : 0
pad_type : center
--------------------------
mean : [0.0, 0.0, 0.0]
scale : [1.0, 1.0, 1.0]
--------------------------
pixel_format : rgb
channel_format : nchw

[Running]: tpuc-opt yolov5n.mlir --chip-assign="chip=cv181x" --import-calibration-table="file=./yolov5n_cali_table asymmetric=False" --chip-top-optimize --fuse-preprocess="mode=INT8 customization_format=RGB_PLANAR align=False" --convert-top-to-tpu="mode=INT8 asymmetric=False linear_quant_mode=NORMAL doWinograd=False ignore_f16_overflow=False" --canonicalize -o yolov5n_cv181x_int8_sym_tpu.mlir
Entering FusePreprocessPass.
Inserting ScalelutOp.
[Success]: tpuc-opt yolov5n.mlir --chip-assign="chip=cv181x" --import-calibration-table="file=./yolov5n_cali_table asymmetric=False" --chip-top-optimize --fuse-preprocess="mode=INT8 customization_format=RGB_PLANAR align=False" --convert-top-to-tpu="mode=INT8 asymmetric=False linear_quant_mode=NORMAL doWinograd=False ignore_f16_overflow=False" --canonicalize -o yolov5n_cv181x_int8_sym_tpu.mlir
[CMD]: model_runner.py --input yolov5n_in_ori.npz --model yolov5n_cv181x_int8_sym_tpu.mlir --output yolov5n_cv181x_int8_sym_tpu_outputs.npz
[##################################################] 100%
[Running]: npz_tool.py compare yolov5n_cv181x_int8_sym_tpu_outputs.npz yolov5n_top_outputs.npz --tolerance 0.96,0.72 --except - -vv
compare 1249: 99%|█████████████████████████████████████████████████████████████████▌| 141/142 [00:05<00:00, 21.14it/s][input.26 ] SIMILAR [PASSED]
(1, 16, 320, 320) float32
cosine_similarity = 0.999769
euclidean_similarity = 0.978254
sqnr_similarity = 32.948797
[103 ] SIMILAR [PASSED]
(1, 16, 320, 320) float32
cosine_similarity = 0.999255
euclidean_similarity = 0.961272
sqnr_similarity = 24.556572
...
(1, 255, 40, 40) float32
cosine_similarity = 0.999221
euclidean_similarity = 0.959803
sqnr_similarity = 18.724862
[1249 ] SIMILAR [PASSED]
(1, 255, 20, 20) float32
cosine_similarity = 0.999214
euclidean_similarity = 0.960290
sqnr_similarity = 18.388116
142 compared
142 passed
0 equal, 0 close, 142 similar
0 failed
0 not equal, 0 not similar
min_similiarity = (0.9679524302482605, 0.7443984113616068, 11.602303981781006)
Target yolov5n_cv181x_int8_sym_tpu_outputs.npz
Reference yolov5n_top_outputs.npz
npz compare PASSED.
compare 1249: 100%|██████████████████████████████████████████████████████████████████| 142/142 [00:06<00:00, 22.79it/s]
[Success]: npz_tool.py compare yolov5n_cv181x_int8_sym_tpu_outputs.npz yolov5n_top_outputs.npz --tolerance 0.96,0.72 --except - -vv
[Running]: tpuc-opt yolov5n_cv181x_int8_sym_tpu.mlir --mlir-disable-threading --strip-io-quant="quant_input=False quant_output=False" --chip-tpu-optimize --distribute='num_device=1' --weight-reorder --subnet-divide="dynamic=False" --op-reorder --layer-group="opt=2" --parallel='num_core=1' --address-assign -o yolov5n_cv181x_int8_sym_final.mlir
==---------------------------==
Run LayerGroupSearchPass :
Searching the optimal layer groups
==---------------------------==

=======================================================
* Dynamic Programming layer group with cluster

total num of base_group is 7
clusters idx(size): 0(1), 1(2), 3(2), 5(2), 7(2), 9(2), 11(2), 13(1), 14(1), 15(2), 17(2), 19(2), 21(2), 23(2), 25(2), 27(2), 29(2), 31(2), 33(1), 34(2), 36(2), 38(2), 40(2), 42(2), 44(2), 46(2), 48(2), 50(2), 52(2), 54(2), 56(1), 57(1), 58(2), 60(1), 61(2), 63(2), 65(2), 67(2), 69(2), 71(2), 73(2), 75(2), 77(2), 79(2), 81(2), 83(2), 85(2), 87(2), 89(2), 91(2), 93(1), 94(1), 95(2), 97(2), 99(2), 101(2), 103(2), 105(2), 107(2), 109(1), 110(2), 112(2), 114(2), 116(2), 118(2), 120(2), 122(1), 123(1), 124(2), 126(2), 128(2), 130(2), 132(2), 134(2), 136(1), 137(2), 139(2),
process base group 0, layer_num=141, cluster_num=77
Searching best group slices...
[#################################################] 100%
clusters idx(size): 0(1),
process base group 1, layer_num=1, cluster_num=1
clusters idx(size): 0(1),
process base group 2, layer_num=1, cluster_num=1
clusters idx(size): 0(1),
process base group 3, layer_num=1, cluster_num=1
clusters idx(size): 0(1),
process base group 4, layer_num=1, cluster_num=1
clusters idx(size): 0(1),
process base group 5, layer_num=1, cluster_num=1
clusters idx(size): 0(1),
process base group 6, layer_num=1, cluster_num=1

Consider redundant computation and gdma cost

The final cost of the two group is 1182594
//// Group cost 1182594, optimal cut idx 139
The final cost of the two group is 1116710
//// Group cost 1116710, optimal cut idx 138
The final cost of the two group is 1315164
The final cost of the two group is 970894
//// Group cost 970894, optimal cut idx 137
The final cost of the two group is 866493
//// Group cost 866493, optimal cut idx 136
The final cost of the two group is 877481
The final cost of the two group is 941308
The final cost of the two group is 892746
The pre cost of the two group is 898167
The final cost of the two group is 901710
//// Group cost 901710, optimal cut idx 132
The final cost of the two group is 832079
....
The final cost of the two group is 4092392
//// Group cost 4092392, optimal cut idx 0

Merge cut idx to reduce gdma cost

==---------------------------==
Run GroupPostTransformPass :
Some transform after layer groups is determined
==---------------------------==
==---------------------------==
Run TimeStepAssignmentPass :
Assign timestep task for each group.
==---------------------------==
==---------------------------==
Run LocalMemoryAllocationPass :
Allocate local memory for all layer groups
==---------------------------==
==---------------------------==
Run TimeStepCombinePass :
Combine time step for better parallel balance
==---------------------------==
==---------------------------==
Run GroupDataMoveOverlapPass :
Overlap data move between two layer group
==---------------------------==
GmemAllocator use OpSizeOrderAssign
[Success]: tpuc-opt yolov5n_cv181x_int8_sym_tpu.mlir --mlir-disable-threading --strip-io-quant="quant_input=False quant_output=False" --chip-tpu-optimize --distribute='num_device=1' --weight-reorder --subnet-divide="dynamic=False" --op-reorder --layer-group="opt=2" --parallel='num_core=1' --address-assign -o yolov5n_cv181x_int8_sym_final.mlir
[Running]: tpuc-opt yolov5n_cv181x_int8_sym_final.mlir --codegen="model_file=yolov5n_int8_fuse.cvimodel embed_debug_info=true model_version=latest" -o /dev/null
[oc_pos=32] cur_oc 8, stepSize 1024, compressedSize 1040, SKIP
[Success]: tpuc-opt yolov5n_cv181x_int8_sym_final.mlir --codegen="model_file=yolov5n_int8_fuse.cvimodel embed_debug_info=true model_version=latest" -o /dev/null
[CMD]: model_runner.py --input yolov5n_in_ori.npz --model yolov5n_int8_fuse.cvimodel --output yolov5n_cv181x_int8_sym_model_outputs.npz
setenv:cv181x
Start TPU Simulator for cv181x
device[0] opened, 4294967296
version: 1.4.0
yolov5n Build at 2024-11-11 19:37:51 For platform cv181x
Cmodel: bm_load_cmdbuf
Max SharedMem size:2457600
Cmodel: bm_run_cmdbuf
device[0] closed
[Running]: npz_tool.py compare yolov5n_cv181x_int8_sym_model_outputs.npz yolov5n_cv181x_int8_sym_tpu_outputs.npz --tolerance 0.99,0.90 --except - -vv
compare 1249_f32: 88%|█████████████████████████████████████████████████████████▊ | 7/8 [00:00<00:00, 69.16it/s][964 ] EQUAL [PASSED]
(1, 64, 80, 80) float32
[1081 ] EQUAL [PASSED]
(1, 128, 40, 40) float32
[input.1 ] EQUAL [PASSED]
(1, 256, 20, 20) float32
[1198 ] EQUAL [PASSED]
(1, 256, 20, 20) float32
[1219_f32 ] EQUAL [PASSED]
(1, 255, 80, 80) float32
[1234_f32 ] EQUAL [PASSED]
(1, 255, 40, 40) float32
[1249 ] EQUAL [PASSED]
(1, 255, 20, 20) float32
[1249_f32 ] EQUAL [PASSED]
(1, 255, 20, 20) float32
8 compared
8 passed
8 equal, 0 close, 0 similar
0 failed
0 not equal, 0 not similar
min_similiarity = (1.0, 1.0, inf)
Target yolov5n_cv181x_int8_sym_model_outputs.npz
Reference yolov5n_cv181x_int8_sym_tpu_outputs.npz
npz compare PASSED.
compare 1249_f32: 100%|██████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 27.38it/s]
[Success]: npz_tool.py compare yolov5n_cv181x_int8_sym_model_outputs.npz yolov5n_cv181x_int8_sym_tpu_outputs.npz --tolerance 0.99,0.90 --except - -vv
root@2a46fc75400f:/workspace/yolov5n_torch# scp -r /workspace/tpu-sdk root@192.168.42.1:/mnt/tpu/
root@192.168.42.1's password:
OpenCVModules-release.cmake 100% 2053 402.1KB/s 00:00
haarcascade_eye.xml 100% 333KB 2.7MB/s 00:00
haarcascade_smile.xml 100% 184KB 2.7MB/s 00:00
....
libcvimath-static.a 100% 172KB 2.6MB/s 00:00
libcviruntime.so 100% 574KB 2.9MB/s 00:00
root@2a46fc75400f:/workspace/yolov5n_torch# scp /workspace/yolov5n_torch/yolov5n_int8_fuse.cvimodel root@192.168.42.1:/
mnt/tpu/tpu-sdk/
root@192.168.42.1's password:
yolov5n_int8_fuse.cvimodel 100% 2158KB 2.9MB/s 00:00
root@2a46fc75400f:/workspace/yolov5n_torch# ls -l
total 389176
drwxr-xr-x 2 root root 4096 Nov 11 19:21 COCO2017
-rw-r--r-- 1 root root 12398 Nov 11 19:37 _weight_map.csv
-rwxr-xr-x 1 root root 14447400 Nov 9 09:00 best.pt
-rwxr-xr-x 1 root root 40717 Oct 29 07:42 cat.jpg
drwxr-xr-x 2 root root 4096 Nov 11 19:21 image
drwxr-xr-x 5 root root 4096 Nov 7 14:12 train_data
-rwxr-xr-x 1 root root 2524205 Nov 8 01:36 train_data.zip
drwxr-xr-x 2 root root 4096 Nov 9 10:02 work
-rw-r--r-- 1 root root 64711 Nov 11 19:23 yolov5n.mlir
-rw-r--r-- 1 root root 8011 Nov 11 19:35 yolov5n_cali_table
-rw-r--r-- 1 root root 2210112 Nov 11 19:37 yolov5n_int8_fuse.cvimodel

root@2a46fc75400f:/workspace/yolov5n_torch#

now using the best.pt

model_deploy.py
--mlir yolov5n.mlir
--quantize INT8
--calibration_table ./yolov5n_cali_table
--chip cv181x
--test_input ./cat.jpg
--test_reference yolov5n_top_outputs.npz
--compare_all
--fuse_preprocess
--debug
--model yolov5n_int8_fuse.cvimodel

Thank you!

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

UltralyticsAssistant · 2024-11-11T12:26:14Z

👋 Hello @tcpipchip, thank you for your interest in YOLOv5 🚀! It looks like you're working with the MILK-V 256, a RISC-V processor, and encountering a segmentation fault when running your exported model. No worries, we're here to help! 😊

For 🐛 Bug Reports like this, a minimum reproducible example is crucial, and you've done a great job in providing detailed steps and descriptions! This helps us understand the issue you're facing better. An Ultralytics engineer will review your report and assist you soon.

In the meantime, please verify you've set up your environment correctly:

Requirements

Ensure you have Python>=3.8.0 installed with all the relevant libraries from the requirements.txt and importantly, make sure you are using PyTorch>=1.8. To ensure everything is set up correctly, you might want to recreate your environment from scratch:

Clone the YOLOv5 repository.
Navigate into the cloned directory and install the necessary dependencies using the requirements file.

Environments

YOLOv5 runs smoothly in various environments like notebooks (Google Colab, Kaggle, etc.), cloud environments (Google Cloud, Amazon Web Services), or using Docker images with all dependencies pre-installed. Ensure your environment is up-to-date and configured correctly, including CUDA, cuDNN, Python, and PyTorch installations, particularly if you are leveraging GPU resources.

Debugging Tips

Double-check your export process, especially any parameters or configurations specific to the MILK-V setup.
Compare your Colab-trained model's configuration and preprocessing steps with those of the pre-trained yolov5n.pt as closely as possible.
If applicable, verify the Docker setup and try to eliminate any conflicts or errors present.

Stay tuned, and thank you for providing a comprehensive report! 📝 If there's anything else you can share about the exact error message or log outputs, feel free to add that information here. Our team is eager to assist you further! 🚀

tcpipchip · 2024-11-11T12:36:28Z

yea, requirements ok!
Python and pytorch.
Please, can you train my images and send your .pt version to me ?

pderrenger · 2024-11-11T15:33:16Z

@tcpipchip i'm sorry, but we can't provide private training services. However, you can follow our Train Custom Data guide to train your model. If you encounter issues, feel free to ask for help here.

tcpipchip · 2024-11-11T15:40:10Z

but have some tip about my problem ?

pderrenger · 2024-11-11T18:37:22Z

It seems like the issue might be related to the conversion process of your custom model. Ensure your model's architecture matches the pre-trained model you successfully converted, and double-check the conversion steps for any discrepancies.

tcpipchip · 2024-11-11T18:41:47Z

i am investiganting now if is the image size...and testing with other pre-trainned pt of thirdy party

pderrenger · 2024-11-12T10:45:22Z

Testing with different image sizes and pre-trained models is a good approach. Ensure that the input dimensions match those expected by the model, and verify compatibility with the latest YOLOv5 version. If issues persist, consider checking the model's architecture and conversion process for inconsistencies.

tcpipchip · 2024-11-15T17:02:57Z

Got it works, after 100 hours tryng
https://milk-v.blogspot.com/2024/10/milk-v-yolov5-criando-dataset.html

pderrenger · 2024-11-16T06:33:25Z

@tcpipchip glad to hear you resolved your issue with YOLOv5 on the MILK-V TPU! For others who might encounter similar challenges with custom model deployment on TPU devices, I recommend checking our model export guide to understand the correct conversion steps and requirements for various hardware targets.

tcpipchip · 2024-11-16T17:05:33Z

Thanks. will add your link on the blog.

milk-v uses a export py to onnx, looks that is the same code of your company

pderrenger · 2024-11-17T07:47:35Z

Thank you for sharing your blog post! While we appreciate the mention, please note that YOLOv5's ONNX export functionality is open-source under the AGPL-3.0 license, as documented in our model export guide. We're glad you found the TPU deployment process helpful.

tcpipchip added the bug Something isn't working label Nov 11, 2024

UltralyticsAssistant added detect Object Detection issues, PR's embedded Pertaining to IOT or low-power devices exports Model exports (ONNX, TensorRT, TFLite, etc.) labels Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yolov5 on milk-v tpu 256 #13411

yolov5 on milk-v tpu 256 #13411

tcpipchip commented Nov 11, 2024

UltralyticsAssistant commented Nov 11, 2024

tcpipchip commented Nov 11, 2024 •

edited

Loading

pderrenger commented Nov 11, 2024

tcpipchip commented Nov 11, 2024

pderrenger commented Nov 11, 2024

tcpipchip commented Nov 11, 2024

pderrenger commented Nov 12, 2024

tcpipchip commented Nov 15, 2024 •

edited

Loading

pderrenger commented Nov 16, 2024

tcpipchip commented Nov 16, 2024

pderrenger commented Nov 17, 2024

yolov5 on milk-v tpu 256 #13411

yolov5 on milk-v tpu 256 #13411

Comments

tcpipchip commented Nov 11, 2024

Search before asking

YOLOv5 Component

Bug

Environment

Minimal Reproducible Example

Additional

YOLOv5 requirements

Usage: pip install -r requirements.txt

Base ------------------------------------------------------------------------

protobuf<=3.20.1 # #8012

Logging ---------------------------------------------------------------------

tensorboard>=2.4.1

clearml>=1.2.0

comet

Plotting --------------------------------------------------------------------

Export ----------------------------------------------------------------------

coremltools>=6.0 # CoreML export

onnx>=1.10.0 # ONNX export

onnx-simplifier>=0.4.1 # ONNX simplifier

nvidia-pyindex # TensorRT export

nvidia-tensorrt # TensorRT export

scikit-learn<=1.1.2 # CoreML quantization

tensorflow>=2.4.0,<=2.13.1 # TF exports (-cpu, -aarch64, -macos)

tensorflowjs>=3.9.0 # TF.js export

openvino-dev>=2023.0 # OpenVINO export

Deploy ----------------------------------------------------------------------

tritonclient[all]~=2.24.0

Extras ----------------------------------------------------------------------

ipython # interactive notebook

mss # screenshots

albumentations>=1.0.3

pycocotools>=2.0.6 # COCO mAP

======================================================= ***** Dynamic Programming layer group with cluster ****

Consider redundant computation and gdma cost

Merge cut idx to reduce gdma cost

Are you willing to submit a PR?

UltralyticsAssistant commented Nov 11, 2024

Requirements

Environments

Debugging Tips

tcpipchip commented Nov 11, 2024 • edited Loading

pderrenger commented Nov 11, 2024

tcpipchip commented Nov 11, 2024

pderrenger commented Nov 11, 2024

tcpipchip commented Nov 11, 2024

pderrenger commented Nov 12, 2024

tcpipchip commented Nov 15, 2024 • edited Loading

pderrenger commented Nov 16, 2024

tcpipchip commented Nov 16, 2024

pderrenger commented Nov 17, 2024

=======================================================
* Dynamic Programming layer group with cluster

tcpipchip commented Nov 11, 2024 •

edited

Loading

tcpipchip commented Nov 15, 2024 •

edited

Loading