You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Team,
We are trying to run/execute SONiC community defined T1-topology test-suite using '202305 Release build' on platform 'Accton-AS7716-32X'.
On every attempt of our test-run, we are facing 'CPU Stall Issue' randomly, wherein noticed that Kernel-threads are getting into Hung state (randomly) and NMI watch-dog timer is getting triggered, which results in board restart.
We are observing the same issue in 202311 & 'master' branch release builds (generated from Community Azure pipeline) i.e. CPU stall issue is seen with SONiC NOS builds having Bullseye/5.10-Kernel version (i.e. 202311) & Bookworm/6.1-kernel version (i.e. master), as well.
We consulted Accton team and shared our observations, they confirmed that it is not a hardware issue and need to be checked further from Community NOS. We have made some initial analysis based on the received logs and reported issue in Community GITHub link (currently unable to proceed further & conclude/resolve the issue).
Request your expert advice in this regard to solve this issue and provide any suggestions in this regard.
GITHub Issue Links for CPU Stall Issues: 1. CPU Stall: Soft Lockup issue observed in 202305 Release Branch [Kernel: 5.10.140-1] [Accton-AS7716-32X] sonic-net/sonic-buildimage#17358
Some Observations:
- These CPU Stall issues are seen RANDOMLY, while executing community test-suite.
- These CPU Stall issues are predominantly seen while executing 'T1-topology cases' and not with T0-topology cases.
- These CPU Stall issues are seen only when test cases are executed in a batch. If the same test-case is executed individually, then such issues are not seen.
The text was updated successfully, but these errors were encountered:
Hi Team,
We are trying to run/execute SONiC community defined T1-topology test-suite using '202305 Release build' on platform 'Accton-AS7716-32X'.
On every attempt of our test-run, we are facing 'CPU Stall Issue' randomly, wherein noticed that Kernel-threads are getting into Hung state (randomly) and NMI watch-dog timer is getting triggered, which results in board restart.
We are observing the same issue in 202311 & 'master' branch release builds (generated from Community Azure pipeline) i.e. CPU stall issue is seen with SONiC NOS builds having Bullseye/5.10-Kernel version (i.e. 202311) & Bookworm/6.1-kernel version (i.e. master), as well.
We consulted Accton team and shared our observations, they confirmed that it is not a hardware issue and need to be checked further from Community NOS. We have made some initial analysis based on the received logs and reported issue in Community GITHub link (currently unable to proceed further & conclude/resolve the issue).
Request your expert advice in this regard to solve this issue and provide any suggestions in this regard.
GITHub Issue Links for CPU Stall Issues:
1. CPU Stall: Soft Lockup issue observed in 202305 Release Branch [Kernel: 5.10.140-1] [Accton-AS7716-32X]
sonic-net/sonic-buildimage#17358
2. CPU Stall: Hard Lockup issue observed in 202305 Release Branch [Kernel: 5.10.140-1] [Accton-AS7716-32X]
sonic-net/sonic-buildimage#17361
sonic-net/sonic-buildimage#17363
Some Observations:
- These CPU Stall issues are seen RANDOMLY, while executing community test-suite.
- These CPU Stall issues are predominantly seen while executing 'T1-topology cases' and not with T0-topology cases.
- These CPU Stall issues are seen only when test cases are executed in a batch. If the same test-case is executed individually, then such issues are not seen.
The text was updated successfully, but these errors were encountered: