You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a container take a longer time to stop sometimes allocation exit code is 0 and other times is 137.
[root@nomadtesting test-image]# nomad job status redis
ID = redis
Name = redis
Submit Date = 2024-11-29T12:56:38+02:00
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Node Pool = default
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost Unknown
redis 0 0 1 4 0 0 0
Allocations
ID Node ID Task Group Version Desired Status Created Modified
68808dc9 253a8f99 redis 0 run running 17m13s ago 16m57s ago
b897b38f 253a8f99 redis 0 stop failed 23m1s ago 17m13s ago
cf225248 253a8f99 redis 0 stop failed 32m42s ago 23m1s ago
e20f42db 253a8f99 redis 0 stop failed 1h15m ago 32m42s ago
[root@nomadtesting test-image]# nomad alloc status e2
ID = e20f42db-31cd-f245-c548-6f9f5409bea2
Eval ID = 942ad545
Name = redis.redis[0]
Node ID = 253a8f99
Node Name = nomadtesting.novalocal
Job ID = redis
Job Version = 0
Client Status = failed
Client Description = Failed tasks
Desired Status = stop
Desired Description = alloc was rescheduled because it failed
Created = 1h16m ago
Modified = 33m38s ago
Replacement Alloc ID = cf225248
Task "redis" is "dead"
Task Resources:
CPU Memory Disk Addresses
0/500 MHz 692 KiB/256 MiB 300 MiB
Task Events:
Started At = 2024-11-29T10:58:53Z
Finished At = 2024-11-29T11:40:57Z
Total Restarts = 1
Last Restart = 2024-11-29T13:37:52+02:00
Recent Events:
Time Type Description
2024-11-29T13:40:57+02:00 Not Restarting Error was unrecoverable
2024-11-29T13:40:57+02:00 Driver Failure rpc error: code = FailedPrecondition desc = failed to remove dead container: cannot delete container, status code: 200
2024-11-29T13:37:52+02:00 Restarting Task restarting in 0s
2024-11-29T13:35:29+02:00 Terminated Exit Code: 137
2024-11-29T13:33:45+02:00 Restart Signaled Template with change_mode restart re-rendered
2024-11-29T12:58:53+02:00 Started Task started by client
2024-11-29T12:58:52+02:00 Task Setup Building Task Directory
2024-11-29T12:58:52+02:00 Received Task received by client
[root@nomadtesting test-image]# nomad alloc status cf
ID = cf225248-9e5c-0219-2624-9e6b6cb5010b
Eval ID = cdb4feb7
Name = redis.redis[0]
Node ID = 253a8f99
Node Name = nomadtesting.novalocal
Job ID = redis
Job Version = 0
Client Status = failed
Client Description = Failed tasks
Desired Status = stop
Desired Description = alloc was rescheduled because it failed
Created = 34m14s ago
Modified = 24m33s ago
Replacement Alloc ID = b897b38f
Task "redis" is "dead"
Task Resources:
CPU Memory Disk Addresses
0/500 MHz 688 KiB/256 MiB 300 MiB
Task Events:
Started At = 2024-11-29T11:42:11Z
Finished At = 2024-11-29T11:49:38Z
Total Restarts = 1
Last Restart = 2024-11-29T13:49:23+02:00
Recent Events:
Time Type Description
2024-11-29T13:49:38+02:00 Not Restarting Error was unrecoverable
2024-11-29T13:49:38+02:00 Driver Failure rpc error: code = FailedPrecondition desc = failed to remove dead container: cannot delete container, status code: 200
2024-11-29T13:49:23+02:00 Restarting Task restarting in 0s
2024-11-29T13:49:14+02:00 Terminated Exit Code: 0
2024-11-29T13:48:56+02:00 Restart Signaled Template with change_mode restart re-rendered
2024-11-29T13:42:11+02:00 Started Task started by client
2024-11-29T13:41:57+02:00 Task Setup Building Task Directory
2024-11-29T13:41:57+02:00 Received Task received by client
This is caused by the fact that after a stop command is sent to running container
curl -v -s --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/$container_id/stats?stream=false will return 200 with an empty body that will cause runContainerMonitor to call ContainerInspect. If the container wasn't killed yet it has exitcode=0
[root@nomadtesting test-image]# curl -v -s --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/$container_id/json | jq | grep -i exitcode
* Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80 (#0)
> GET /v1.0.0/libpod/containers/387834dddaae1e141763740b37b6bec33d39e6bba998cc333370197ee2cf12be/json HTTP/1.1
> Host: d
> User-Agent: curl/7.76.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Api-Version: 1.41
< Content-Type: application/json
< Libpod-Api-Version: 5.2.2
< Server: Libpod/5.2.2 (linux)
< X-Reference-Id: 0xc00070a000
< Date: Fri, 29 Nov 2024 12:09:30 GMT
< Transfer-Encoding: chunked
<
{ [6334 bytes data]
* Connection #0 to host d left intact
"ExitCode": 0,
"KubeExitCodePropagation": "invalid",
but after it is killed it has correct exit code
* Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80 (#0)
> GET /v1.0.0/libpod/containers/387834dddaae1e141763740b37b6bec33d39e6bba998cc333370197ee2cf12be/json HTTP/1.1
> Host: d
> User-Agent: curl/7.76.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Api-Version: 1.41
< Content-Type: application/json
< Libpod-Api-Version: 5.2.2
< Server: Libpod/5.2.2 (linux)
< X-Reference-Id: 0xc00070a990
< Date: Fri, 29 Nov 2024 12:20:52 GMT
< Transfer-Encoding: chunked
<
{ [6070 bytes data]
* Connection #0 to host d left intact
"ExitCode": 137,
"KubeExitCodePropagation": "invalid",
The text was updated successfully, but these errors were encountered:
nomad podman driver v0.6.1
When a container take a longer time to stop sometimes allocation exit code is 0 and other times is 137.
This is caused by the fact that after a stop command is sent to running container
curl -v -s --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/$container_id/stats?stream=false will return 200 with an empty body that will cause runContainerMonitor to call ContainerInspect. If the container wasn't killed yet it has exitcode=0
but after it is killed it has correct exit code
The text was updated successfully, but these errors were encountered: