Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect exit code #391

Open
victorserbu2709 opened this issue Nov 29, 2024 · 0 comments
Open

Incorrect exit code #391

victorserbu2709 opened this issue Nov 29, 2024 · 0 comments

Comments

@victorserbu2709
Copy link

nomad podman driver v0.6.1

When a container take a longer time to stop sometimes allocation exit code is 0 and other times is 137.

[root@nomadtesting test-image]# nomad job status redis 
ID            = redis
Name          = redis
Submit Date   = 2024-11-29T12:56:38+02:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Node Pool     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
redis       0       0         1        4       0         0     0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
68808dc9  253a8f99  redis       0        run      running  17m13s ago  16m57s ago
b897b38f  253a8f99  redis       0        stop     failed   23m1s ago   17m13s ago
cf225248  253a8f99  redis       0        stop     failed   32m42s ago  23m1s ago
e20f42db  253a8f99  redis       0        stop     failed   1h15m ago   32m42s ago
[root@nomadtesting test-image]# nomad alloc status e2
ID                   = e20f42db-31cd-f245-c548-6f9f5409bea2
Eval ID              = 942ad545
Name                 = redis.redis[0]
Node ID              = 253a8f99
Node Name            = nomadtesting.novalocal
Job ID               = redis
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = stop
Desired Description  = alloc was rescheduled because it failed
Created              = 1h16m ago
Modified             = 33m38s ago
Replacement Alloc ID = cf225248

Task "redis" is "dead"
Task Resources:
CPU        Memory           Disk     Addresses
0/500 MHz  692 KiB/256 MiB  300 MiB  

Task Events:
Started At     = 2024-11-29T10:58:53Z
Finished At    = 2024-11-29T11:40:57Z
Total Restarts = 1
Last Restart   = 2024-11-29T13:37:52+02:00

Recent Events:
Time                       Type              Description
2024-11-29T13:40:57+02:00  Not Restarting    Error was unrecoverable
2024-11-29T13:40:57+02:00  Driver Failure    rpc error: code = FailedPrecondition desc = failed to remove dead container: cannot delete container, status code: 200
2024-11-29T13:37:52+02:00  Restarting        Task restarting in 0s
2024-11-29T13:35:29+02:00  Terminated        Exit Code: 137
2024-11-29T13:33:45+02:00  Restart Signaled  Template with change_mode restart re-rendered
2024-11-29T12:58:53+02:00  Started           Task started by client
2024-11-29T12:58:52+02:00  Task Setup        Building Task Directory
2024-11-29T12:58:52+02:00  Received          Task received by client
[root@nomadtesting test-image]# nomad alloc status cf
ID                   = cf225248-9e5c-0219-2624-9e6b6cb5010b
Eval ID              = cdb4feb7
Name                 = redis.redis[0]
Node ID              = 253a8f99
Node Name            = nomadtesting.novalocal
Job ID               = redis
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = stop
Desired Description  = alloc was rescheduled because it failed
Created              = 34m14s ago
Modified             = 24m33s ago
Replacement Alloc ID = b897b38f

Task "redis" is "dead"
Task Resources:
CPU        Memory           Disk     Addresses
0/500 MHz  688 KiB/256 MiB  300 MiB  

Task Events:
Started At     = 2024-11-29T11:42:11Z
Finished At    = 2024-11-29T11:49:38Z
Total Restarts = 1
Last Restart   = 2024-11-29T13:49:23+02:00

Recent Events:
Time                       Type              Description
2024-11-29T13:49:38+02:00  Not Restarting    Error was unrecoverable
2024-11-29T13:49:38+02:00  Driver Failure    rpc error: code = FailedPrecondition desc = failed to remove dead container: cannot delete container, status code: 200
2024-11-29T13:49:23+02:00  Restarting        Task restarting in 0s
2024-11-29T13:49:14+02:00  Terminated        Exit Code: 0
2024-11-29T13:48:56+02:00  Restart Signaled  Template with change_mode restart re-rendered
2024-11-29T13:42:11+02:00  Started           Task started by client
2024-11-29T13:41:57+02:00  Task Setup        Building Task Directory
2024-11-29T13:41:57+02:00  Received          Task received by client

This is caused by the fact that after a stop command is sent to running container
curl -v -s --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/$container_id/stats?stream=false will return 200 with an empty body that will cause runContainerMonitor to call ContainerInspect. If the container wasn't killed yet it has exitcode=0

[root@nomadtesting test-image]# curl -v  -s --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/$container_id/json | jq  | grep -i exitcode
*   Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80 (#0)
> GET /v1.0.0/libpod/containers/387834dddaae1e141763740b37b6bec33d39e6bba998cc333370197ee2cf12be/json HTTP/1.1
> Host: d
> User-Agent: curl/7.76.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Api-Version: 1.41
< Content-Type: application/json
< Libpod-Api-Version: 5.2.2
< Server: Libpod/5.2.2 (linux)
< X-Reference-Id: 0xc00070a000
< Date: Fri, 29 Nov 2024 12:09:30 GMT
< Transfer-Encoding: chunked
< 
{ [6334 bytes data]
* Connection #0 to host d left intact
    "ExitCode": 0,
  "KubeExitCodePropagation": "invalid",

but after it is killed it has correct exit code

*   Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80 (#0)
> GET /v1.0.0/libpod/containers/387834dddaae1e141763740b37b6bec33d39e6bba998cc333370197ee2cf12be/json HTTP/1.1
> Host: d
> User-Agent: curl/7.76.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Api-Version: 1.41
< Content-Type: application/json
< Libpod-Api-Version: 5.2.2
< Server: Libpod/5.2.2 (linux)
< X-Reference-Id: 0xc00070a990
< Date: Fri, 29 Nov 2024 12:20:52 GMT
< Transfer-Encoding: chunked
< 
{ [6070 bytes data]
* Connection #0 to host d left intact
    "ExitCode": 137,
  "KubeExitCodePropagation": "invalid",
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant