You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Through this Issue I selected the expected high-performance operator, but found a problem. If the data type is float, the calculation result is correct, but if the data type is half, the calculation result is incorrect.
And I got the output log with VERBOSE=ON compile option, the fp16 output is obviously incorrect. I suspect data overflow, but 18000 actually does not exceed the range of FP16, so do you have any ideas?
[DEBUG] Compiling routine 'GEMV-32 (single)'
[DEBUG] Completed compilation in 150.38 ms
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 13.41 ms
No. 0 GEMV execution time: 169.471 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_1_64_1_XgemvFastRot_8_32_32_TrsvRoutine_32
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.29 ms
No. 1 GEMV execution time: 4.36427 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_1_64_1_XgemvFastRot_8_32_32_TrsvRoutine_32
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.34 ms
No. 2 GEMV execution time: 4.44021 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_1_64_1_XgemvFastRot_8_32_32_TrsvRoutine_32
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.20 ms
No. 3 GEMV execution time: 4.30474 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_1_64_1_XgemvFastRot_8_32_32_TrsvRoutine_32
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.79 ms
No. 4 GEMV execution time: 5.02594 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_1_64_1_XgemvFastRot_8_32_32_TrsvRoutine_32
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.48 ms
No. 5 GEMV execution time: 4.85479 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_1_64_1_XgemvFastRot_8_32_32_TrsvRoutine_32
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.64 ms
No. 6 GEMV execution time: 4.93505 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_1_64_1_XgemvFastRot_8_32_32_TrsvRoutine_32
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.87 ms
No. 7 GEMV execution time: 5.31823 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_1_64_1_XgemvFastRot_8_32_32_TrsvRoutine_32
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 5.06 ms
No. 8 GEMV execution time: 5.23974 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_1_64_1_XgemvFastRot_8_32_32_TrsvRoutine_32
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.83 ms
No. 9 GEMV execution time: 5.00667 ms
GEMV execution time: 21.2961 ms
Result : 18019.4 18019.4 18019.4 18019.4 18019.4 18019.4 18019.4 18019.4 18019.4 18019.4
[DEBUG] Searching database for kernel 'Xgemv'
[DEBUG] Device type 'GPU'; vendor 'QUALCOMM'
[DEBUG] Device name 'QUALCOMM Adreno(TM) 750'; architecture 'OpenCL C 3.0 Adreno(TM) 750'
[DEBUG] Found architectures of vendor 'QUALCOMM' and type 'GPU'
[DEBUG] Found devices of architecture type 'default'
[DEBUG] Found parameters for device type 'default'
[DEBUG] Searching database for kernel 'XgemvFast'
[DEBUG] Device type 'GPU'; vendor 'QUALCOMM'
[DEBUG] Device name 'QUALCOMM Adreno(TM) 750'; architecture 'OpenCL C 3.0 Adreno(TM) 750'
[DEBUG] Found architectures of vendor 'QUALCOMM' and type 'GPU'
[DEBUG] Found devices of architecture type 'default'
[DEBUG] Found parameters for device type 'default'
[DEBUG] Searching database for kernel 'XgemvFastRot'
[DEBUG] Device type 'GPU'; vendor 'QUALCOMM'
[DEBUG] Device name 'QUALCOMM Adreno(TM) 750'; architecture 'OpenCL C 3.0 Adreno(TM) 750'
[DEBUG] Found architectures of vendor 'QUALCOMM' and type 'GPU'
[DEBUG] Found devices of architecture type 'default'
[DEBUG] Found parameters for device type 'default'
[DEBUG] Searching database for kernel 'TrsvRoutine'
[DEBUG] Device type 'GPU'; vendor 'QUALCOMM'
[DEBUG] Device name 'QUALCOMM Adreno(TM) 750'; architecture 'OpenCL C 3.0 Adreno(TM) 750'
[DEBUG] Found architectures of vendor 'default' and type 'default'
[DEBUG] Found devices of architecture type 'default'
[DEBUG] Found parameters for device type 'default'
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_4_32_4_XgemvFastRot_8_16_8_TrsvRoutine_24
[DEBUG] Compiling routine 'GEMV-16 (half)'
[DEBUG] Completed compilation in 125.75 ms
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 14.59 ms
No. 0 GEMV execution time: 149.68 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_4_32_4_XgemvFastRot_8_16_8_TrsvRoutine_24
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.91 ms
No. 1 GEMV execution time: 4.99323 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_4_32_4_XgemvFastRot_8_16_8_TrsvRoutine_24
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.62 ms
No. 2 GEMV execution time: 4.75443 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_4_32_4_XgemvFastRot_8_16_8_TrsvRoutine_24
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.75 ms
No. 3 GEMV execution time: 4.83063 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_4_32_4_XgemvFastRot_8_16_8_TrsvRoutine_24
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.66 ms
No. 4 GEMV execution time: 4.86604 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_4_32_4_XgemvFastRot_8_16_8_TrsvRoutine_24
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.80 ms
No. 5 GEMV execution time: 4.88469 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_4_32_4_XgemvFastRot_8_16_8_TrsvRoutine_24
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.69 ms
No. 6 GEMV execution time: 4.83239 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_4_32_4_XgemvFastRot_8_16_8_TrsvRoutine_24
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 5.17 ms
No. 7 GEMV execution time: 5.66073 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_4_32_4_XgemvFastRot_8_16_8_TrsvRoutine_24
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.88 ms
No. 8 GEMV execution time: 5.12714 ms
[DEBUG] GEMV_Xgemv_64_1_XgemvFast_4_32_4_XgemvFastRot_8_16_8_TrsvRoutine_24
[DEBUG] Running kernel 'XgemvFast'
[DEBUG] Completed kernel in 4.86 ms
No. 9 GEMV execution time: 5.03026 ms
GEMV execution time: 19.4659 ms
Result : 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096
The text was updated successfully, but these errors were encountered:
Thanks for sharing the complete test results here. I looked at your code briefly but I don't see anything obviously wrong. A few things to try:
Add a clFinish(queue) after your clEnqueueReadBuffer(queue, ...) calls.
Try with smaller input sizes.
Try with different values.
But perhaps a better thing is to try to run the CLBlast tests themselves. Run CMake with -DTESTS=ON (make sure you have a reference BLAS installed for comparison, e.g. OpenBLAS or MKL) and then run the appropriate test, e.g.:
Through this Issue I selected the expected high-performance operator, but found a problem. If the data type is float, the calculation result is correct, but if the data type is half, the calculation result is incorrect.
And I got the output log with
VERBOSE=ON
compile option, the fp16 output is obviously incorrect. I suspect data overflow, but 18000 actually does not exceed the range of FP16, so do you have any ideas?The text was updated successfully, but these errors were encountered: