modeld rebuilds after first pass #34010

adeebshihadeh · 2024-11-13T03:53:46Z

Likely after da952e9

enqueue   2.90 ms -- total run  19.70 ms                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            03:51:16 [0/485]
enqueue   2.91 ms -- total run  19.42 ms
enqueue   2.76 ms -- total run  19.23 ms
enqueue   2.68 ms -- total run  19.42 ms
enqueue   2.84 ms -- total run  19.74 ms
enqueue   2.83 ms -- total run  19.91 ms
enqueue   2.86 ms -- total run  19.53 ms
enqueue   2.90 ms -- total run  19.19 ms
enqueue   3.34 ms -- total run  19.58 ms
enqueue   3.06 ms -- total run  19.42 ms
enqueue   2.70 ms -- total run  19.65 ms
enqueue   2.84 ms -- total run  19.46 ms
enqueue   2.88 ms -- total run  19.24 ms
enqueue   2.66 ms -- total run  19.73 ms
enqueue   2.85 ms -- total run  19.65 ms
enqueue   2.67 ms -- total run  19.70 ms
{'outputs': <Tensor <LB QCOM (1, 6500) float (<BinaryOps.ADD: 9>, <buf real:True device:QCOM size:6500 dtype:dtypes.float offset:0>)> on QCOM with grad None>} (1, 6500) float32
**** test done ****
scons: done building targets.
comma@comma-863276c1:/data/openpilot$
comma@comma-863276c1:/data/openpilot$
comma@comma-863276c1:/data/openpilot$ system/manager/build.py
Using Wayland-EGL
MESA: error: ZINK: vkCreateInstance failed (VK_ERROR_INCOMPATIBLE_DRIVER)
libEGL warning: egl: failed to create dri2 screen
qt.qpa.wayland: "wl-shell" is a deprecated shell extension, prefer using "xdg-shell-v6" or "xdg-shell" if supported by the compositor by setting the environment variable QT_WAYLAND_SHELL_INTEGRATION
Using the 'wl-shell' shell integration
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
PYTHONPATH=":/data/openpilot/tinygrad_repo" QCOM=1 python3 /data/openpilot/tinygrad_repo/examples/openpilot/compile3.py /data/openpilot/selfdrive/modeld/models/dmonitoring_model.onnx /data/openpilot/selfdrive/modeld/models/dmonitoring_model_tinygrad.pkl
python3 /data/openpilot/selfdrive/modeld/get_model_metadata.py /data/openpilot/selfdrive/modeld/models/supercombo.onnx
PYTHONPATH=":/data/openpilot/tinygrad_repo" QCOM=1 python3 /data/openpilot/tinygrad_repo/examples/openpilot/compile3.py /data/openpilot/selfdrive/modeld/models/supercombo.onnx /data/openpilot/selfdrive/modeld/models/supercombo_tinygrad.pkl
loaded model
saved metadata to /data/openpilot/selfdrive/modeld/models/supercombo_metadata.pkl
loaded model
created tensors
run 0
opened device CLANG from pid:52394
scheduled 569 kernels
memory reduced from 17.40 MB -> 15.36 MB, 310 -> 264 bufs
created tensors
run 0
opened device CLANG from pid:52405
scheduled 744 kernels
memory reduced from 67.93 MB -> 61.83 MB, 421 -> 298 bufs
run 1
scheduled 311 kernels
JIT captured 320 kernels with 2 inputs
pruned from 320 -> 136 kernels
JIT memory reduced from 9.76 MB -> 7.72 MB, 129 -> 93 bufs
run 2
JIT GRAPHing batch with 130 kernels on device <tinygrad.runtime.ops_qcom.QCOMDevice object at 0x7f650d1760>
*** CLANG      1 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm    148.07us/     0.15ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** CLANG      2 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm     93.07us/     0.24ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** CLANG      3 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm     87.45us/     0.33ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** CLANG      4 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm     89.48us/     0.42ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** CLANG      5 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm     84.90us/     0.50ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** CLANG      6 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm     83.39us/     0.59ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** QCOM       7 <batched 130>                             arg  2 mem  0.03 GB tm   7889.92us/     8.48ms (   108.41 GFLOPS    3.6|127.9   GB/s)
*** CLANG      8 copy     2384,   CLANG <- QCOM            arg  2 mem  0.03 GB tm    166.10us/     8.64ms (     0.00 GFLOPS    0.0|0.0     GB/s)
captured 136 kernels
jit run validated
mdl size is 7.20M
pkl size is 15.50M
**** compile done ****
enqueue  72.42 ms -- total run  79.79 ms
enqueue   2.25 ms -- total run  10.05 ms
enqueue   1.94 ms -- total run   9.50 ms
enqueue   1.85 ms -- total run   9.29 ms
enqueue   2.01 ms -- total run   9.56 ms
enqueue   2.25 ms -- total run   9.75 ms
enqueue   1.90 ms -- total run   9.91 ms
enqueue   2.13 ms -- total run   9.57 ms
enqueue   1.83 ms -- total run   9.57 ms
enqueue   2.10 ms -- total run   9.77 ms
enqueue   1.64 ms -- total run   9.69 ms
enqueue   2.10 ms -- total run   9.74 ms
enqueue   1.82 ms -- total run   9.35 ms
enqueue   1.81 ms -- total run   9.85 ms
enqueue   1.93 ms -- total run   9.48 ms
enqueue   1.88 ms -- total run   9.39 ms
enqueue   1.89 ms -- total run   9.89 ms
enqueue   1.89 ms -- total run   9.35 ms
enqueue   2.14 ms -- total run   9.76 ms
enqueue   1.88 ms -- total run   9.39 ms
{'outputs': <Tensor <LB QCOM (1, 596) float (<BinaryOps.ADD: 9>, <buf real:True device:QCOM size:596 dtype:dtypes.float offset:0>)> on QCOM with grad None>} (1, 596) float32
**** test done ****
run 1
scheduled 422 kernels
JIT captured 422 kernels with 7 inputs
pruned from 422 -> 194 kernels
JIT memory reduced from 11.92 MB -> 5.83 MB, 193 -> 79 bufs
run 2
JIT GRAPHing batch with 194 kernels on device <tinygrad.runtime.ops_qcom.QCOMDevice object at 0x7f9a947ce0>
*** QCOM       1 <batched 194>                             arg  7 mem  0.16 GB tm     16.29ms/    16.29ms (   106.46 GFLOPS    4.8|118.1   GB/s)
*** CLANG      2 copy    26000,   CLANG <- QCOM            arg  2 mem  0.16 GB tm    287.19us/    16.57ms (     0.00 GFLOPS    0.1|0.1     GB/s)
captured 194 kernels
jit run validated
mdl size is 50.32M
pkl size is 57.48M
**** compile done ****
enqueue 107.56 ms -- total run 124.21 ms
enqueue   2.93 ms -- total run  19.86 ms
enqueue   2.91 ms -- total run  19.30 ms
enqueue   2.91 ms -- total run  19.40 ms
enqueue   2.76 ms -- total run  18.96 ms
enqueue   3.00 ms -- total run  19.69 ms
enqueue   3.22 ms -- total run  20.30 ms
enqueue   2.90 ms -- total run  19.38 ms
enqueue   2.84 ms -- total run  19.62 ms
enqueue   2.82 ms -- total run  19.73 ms
enqueue   2.92 ms -- total run  19.73 ms
enqueue   2.90 ms -- total run  19.73 ms
enqueue   2.95 ms -- total run  19.61 ms
enqueue   4.71 ms -- total run  21.65 ms
enqueue   2.86 ms -- total run  19.13 ms
enqueue   2.76 ms -- total run  19.72 ms
enqueue   2.90 ms -- total run  19.69 ms
enqueue   2.80 ms -- total run  19.43 ms
enqueue   2.87 ms -- total run  19.51 ms
enqueue   2.66 ms -- total run  19.46 ms
{'outputs': <Tensor <LB QCOM (1, 6500) float (<BinaryOps.ADD: 9>, <buf real:True device:QCOM size:6500 dtype:dtypes.float offset:0>)> on QCOM with grad None>} (1, 6500) float32
**** test done ****

scons: done building targets.

The text was updated successfully, but these errors were encountered:

adeebshihadeh added the bug label Nov 13, 2024

adeebshihadeh added this to the 0.9.8 milestone Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modeld rebuilds after first pass #34010

modeld rebuilds after first pass #34010

adeebshihadeh commented Nov 13, 2024 •

edited

Loading

modeld rebuilds after first pass #34010

modeld rebuilds after first pass #34010

Comments

adeebshihadeh commented Nov 13, 2024 • edited Loading

adeebshihadeh commented Nov 13, 2024 •

edited

Loading