Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modeld rebuilds after first pass #34010

Open
adeebshihadeh opened this issue Nov 13, 2024 · 0 comments
Open

modeld rebuilds after first pass #34010

adeebshihadeh opened this issue Nov 13, 2024 · 0 comments
Labels
Milestone

Comments

@adeebshihadeh
Copy link
Contributor

adeebshihadeh commented Nov 13, 2024

Likely after da952e9

enqueue   2.90 ms -- total run  19.70 ms                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            03:51:16 [0/485]
enqueue   2.91 ms -- total run  19.42 ms
enqueue   2.76 ms -- total run  19.23 ms
enqueue   2.68 ms -- total run  19.42 ms
enqueue   2.84 ms -- total run  19.74 ms
enqueue   2.83 ms -- total run  19.91 ms
enqueue   2.86 ms -- total run  19.53 ms
enqueue   2.90 ms -- total run  19.19 ms
enqueue   3.34 ms -- total run  19.58 ms
enqueue   3.06 ms -- total run  19.42 ms
enqueue   2.70 ms -- total run  19.65 ms
enqueue   2.84 ms -- total run  19.46 ms
enqueue   2.88 ms -- total run  19.24 ms
enqueue   2.66 ms -- total run  19.73 ms
enqueue   2.85 ms -- total run  19.65 ms
enqueue   2.67 ms -- total run  19.70 ms
{'outputs': <Tensor <LB QCOM (1, 6500) float (<BinaryOps.ADD: 9>, <buf real:True device:QCOM size:6500 dtype:dtypes.float offset:0>)> on QCOM with grad None>} (1, 6500) float32
**** test done ****
scons: done building targets.
comma@comma-863276c1:/data/openpilot$
comma@comma-863276c1:/data/openpilot$
comma@comma-863276c1:/data/openpilot$ system/manager/build.py
Using Wayland-EGL
MESA: error: ZINK: vkCreateInstance failed (VK_ERROR_INCOMPATIBLE_DRIVER)
libEGL warning: egl: failed to create dri2 screen
qt.qpa.wayland: "wl-shell" is a deprecated shell extension, prefer using "xdg-shell-v6" or "xdg-shell" if supported by the compositor by setting the environment variable QT_WAYLAND_SHELL_INTEGRATION
Using the 'wl-shell' shell integration
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
PYTHONPATH=":/data/openpilot/tinygrad_repo" QCOM=1 python3 /data/openpilot/tinygrad_repo/examples/openpilot/compile3.py /data/openpilot/selfdrive/modeld/models/dmonitoring_model.onnx /data/openpilot/selfdrive/modeld/models/dmonitoring_model_tinygrad.pkl
python3 /data/openpilot/selfdrive/modeld/get_model_metadata.py /data/openpilot/selfdrive/modeld/models/supercombo.onnx
PYTHONPATH=":/data/openpilot/tinygrad_repo" QCOM=1 python3 /data/openpilot/tinygrad_repo/examples/openpilot/compile3.py /data/openpilot/selfdrive/modeld/models/supercombo.onnx /data/openpilot/selfdrive/modeld/models/supercombo_tinygrad.pkl
loaded model
saved metadata to /data/openpilot/selfdrive/modeld/models/supercombo_metadata.pkl
loaded model
created tensors
run 0
opened device CLANG from pid:52394
scheduled 569 kernels
memory reduced from 17.40 MB -> 15.36 MB, 310 -> 264 bufs
created tensors
run 0
opened device CLANG from pid:52405
scheduled 744 kernels
memory reduced from 67.93 MB -> 61.83 MB, 421 -> 298 bufs
run 1
scheduled 311 kernels
JIT captured 320 kernels with 2 inputs
pruned from 320 -> 136 kernels
JIT memory reduced from 9.76 MB -> 7.72 MB, 129 -> 93 bufs
run 2
JIT GRAPHing batch with 130 kernels on device <tinygrad.runtime.ops_qcom.QCOMDevice object at 0x7f650d1760>
*** CLANG      1 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm    148.07us/     0.15ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** CLANG      2 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm     93.07us/     0.24ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** CLANG      3 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm     87.45us/     0.33ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** CLANG      4 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm     89.48us/     0.42ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** CLANG      5 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm     84.90us/     0.50ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** CLANG      6 copy        4,   CLANG <- QCOM            arg  2 mem  0.03 GB tm     83.39us/     0.59ms (     0.00 GFLOPS    0.0|0.0     GB/s)
*** QCOM       7 <batched 130>                             arg  2 mem  0.03 GB tm   7889.92us/     8.48ms (   108.41 GFLOPS    3.6|127.9   GB/s)
*** CLANG      8 copy     2384,   CLANG <- QCOM            arg  2 mem  0.03 GB tm    166.10us/     8.64ms (     0.00 GFLOPS    0.0|0.0     GB/s)
captured 136 kernels
jit run validated
mdl size is 7.20M
pkl size is 15.50M
**** compile done ****
enqueue  72.42 ms -- total run  79.79 ms
enqueue   2.25 ms -- total run  10.05 ms
enqueue   1.94 ms -- total run   9.50 ms
enqueue   1.85 ms -- total run   9.29 ms
enqueue   2.01 ms -- total run   9.56 ms
enqueue   2.25 ms -- total run   9.75 ms
enqueue   1.90 ms -- total run   9.91 ms
enqueue   2.13 ms -- total run   9.57 ms
enqueue   1.83 ms -- total run   9.57 ms
enqueue   2.10 ms -- total run   9.77 ms
enqueue   1.64 ms -- total run   9.69 ms
enqueue   2.10 ms -- total run   9.74 ms
enqueue   1.82 ms -- total run   9.35 ms
enqueue   1.81 ms -- total run   9.85 ms
enqueue   1.93 ms -- total run   9.48 ms
enqueue   1.88 ms -- total run   9.39 ms
enqueue   1.89 ms -- total run   9.89 ms
enqueue   1.89 ms -- total run   9.35 ms
enqueue   2.14 ms -- total run   9.76 ms
enqueue   1.88 ms -- total run   9.39 ms
{'outputs': <Tensor <LB QCOM (1, 596) float (<BinaryOps.ADD: 9>, <buf real:True device:QCOM size:596 dtype:dtypes.float offset:0>)> on QCOM with grad None>} (1, 596) float32
**** test done ****
run 1
scheduled 422 kernels
JIT captured 422 kernels with 7 inputs
pruned from 422 -> 194 kernels
JIT memory reduced from 11.92 MB -> 5.83 MB, 193 -> 79 bufs
run 2
JIT GRAPHing batch with 194 kernels on device <tinygrad.runtime.ops_qcom.QCOMDevice object at 0x7f9a947ce0>
*** QCOM       1 <batched 194>                             arg  7 mem  0.16 GB tm     16.29ms/    16.29ms (   106.46 GFLOPS    4.8|118.1   GB/s)
*** CLANG      2 copy    26000,   CLANG <- QCOM            arg  2 mem  0.16 GB tm    287.19us/    16.57ms (     0.00 GFLOPS    0.1|0.1     GB/s)
captured 194 kernels
jit run validated
mdl size is 50.32M
pkl size is 57.48M
**** compile done ****
enqueue 107.56 ms -- total run 124.21 ms
enqueue   2.93 ms -- total run  19.86 ms
enqueue   2.91 ms -- total run  19.30 ms
enqueue   2.91 ms -- total run  19.40 ms
enqueue   2.76 ms -- total run  18.96 ms
enqueue   3.00 ms -- total run  19.69 ms
enqueue   3.22 ms -- total run  20.30 ms
enqueue   2.90 ms -- total run  19.38 ms
enqueue   2.84 ms -- total run  19.62 ms
enqueue   2.82 ms -- total run  19.73 ms
enqueue   2.92 ms -- total run  19.73 ms
enqueue   2.90 ms -- total run  19.73 ms
enqueue   2.95 ms -- total run  19.61 ms
enqueue   4.71 ms -- total run  21.65 ms
enqueue   2.86 ms -- total run  19.13 ms
enqueue   2.76 ms -- total run  19.72 ms
enqueue   2.90 ms -- total run  19.69 ms
enqueue   2.80 ms -- total run  19.43 ms
enqueue   2.87 ms -- total run  19.51 ms
enqueue   2.66 ms -- total run  19.46 ms
{'outputs': <Tensor <LB QCOM (1, 6500) float (<BinaryOps.ADD: 9>, <buf real:True device:QCOM size:6500 dtype:dtypes.float offset:0>)> on QCOM with grad None>} (1, 6500) float32
**** test done ****

scons: done building targets.
@adeebshihadeh adeebshihadeh added this to the 0.9.8 milestone Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant