Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] crc start error doesn't exit with non-zero exit code #4284

Open
bobbygryzynger opened this issue Jul 24, 2024 · 10 comments
Open

[BUG] crc start error doesn't exit with non-zero exit code #4284

bobbygryzynger opened this issue Jul 24, 2024 · 10 comments
Assignees
Labels
kind/bug Something isn't working status/need triage

Comments

@bobbygryzynger
Copy link

General information

  • OS: Linux
  • Hypervisor: KVM
  • Did you run crc setup before starting it (Yes/No)? Yes
  • Running CRC on: Laptop

CRC version

CRC version: 2.22.1+e8068b4
OpenShift version: 4.13.3
Podman version: 4.4.4

CRC status

$ crc status --log-level debug
DEBU CRC version: 2.22.1+e8068b4                  
DEBU OpenShift version: 4.13.3                    
DEBU Podman version: 4.4.4                        
DEBU Running 'crc status'                         
CRC VM:          Running
OpenShift:       Starting (v4.13.3)
RAM Usage:       9.412GB of 16.77GB
Disk Usage:      20.8GB of 79.93GB (Inside the CRC VM)
Cache Usage:     202.3GB
Cache Directory: /home/bgryzyng/.crc/cache

CRC config

$ crc config view
- consent-telemetry                     : yes
- cpus                                  : 6
- disk-size                             : 75
- enable-cluster-monitoring             : true
- host-network-access                   : true
- memory                                : 16384
- network-mode                          : user

Host Operating System

$ cat /etc/os-release
NAME="Fedora Linux"
...

Steps to reproduce

  1. Run crc start
  2. If an error, occurs a non-zero exit code is not provided

Expected

If an error occurs, exit code should be 1 or greater

Actual

Exit code is zero

Logs

...
INFO 3 operators are progressing: authentication, kube-apiserver, monitoring 
INFO 3 operators are progressing: authentication, kube-apiserver, monitoring 
INFO 3 operators are progressing: authentication, kube-apiserver, monitoring 
INFO 3 operators are progressing: authentication, kube-apiserver, monitoring 
INFO 3 operators are progressing: authentication, kube-apiserver, monitoring 
ERRO Cluster is not ready: cluster operators are still not stable after 10m20.6920432s 
INFO Adding crc-admin and crc-developer contexts to kubeconfig... 
Started the OpenShift cluster.
...
@bobbygryzynger bobbygryzynger added kind/bug Something isn't working status/need triage labels Jul 24, 2024
@praveenkumar
Copy link
Member

ERRO Cluster is not ready: cluster operators are still not stable after 10m20.6920432s

@bobbygryzynger this we display as error but actually it is mostly a warning (we might change the messaging) sometime it happens that due to slow IO or cert regenerate it takes more than expected time for operator reconciliation but that's doesn't mean the cluster is completely useless.

@bobbygryzynger
Copy link
Author

bobbygryzynger commented Jul 25, 2024

Thanks @praveenkumar, understood. For this particular error, the cluster was unstable after this.

A suggestion: make anything that's error level exit with a non-zero. This particular issue could just be a warning, but my experience with it suggests it should still be an error. Maybe it could remain an error with a suggestion logged on how to increase the timeout (if that's possible).

@praveenkumar
Copy link
Member

but my experience with it suggests it should still be an error.

@bobbygryzynger Because for you the operator never become stable?

@bobbygryzynger
Copy link
Author

@praveenkumar, that's right. When I saw this, even after waiting a bit, the operators were still unstable.

@praveenkumar
Copy link
Member

@praveenkumar, that's right. When I saw this, even after waiting a bit, the operators were still unstable.

In that case, if you able to access the kube api then try to use https://docs.openshift.com/container-platform/4.16/support/troubleshooting/troubleshooting-operator-issues.html to see why an operator is not stable.

@bobbygryzynger
Copy link
Author

@praveenkumar the operators being unstable is really a separate issue from what I'm requesting here. All I'd really like to see here is that when errors occur, a non-zero code is produced so that my scripts can pick up on that.

@praveenkumar
Copy link
Member

@bobbygryzynger Right and as I said what you observe as error should be a warning and it is issue with our messaging apart from that we are already returning non-zero exit code when error happen.

@bobbygryzynger
Copy link
Author

It seems to me that the cluster being unstable is worthy of an error, but I won't belabor the point.

@bobbygryzynger bobbygryzynger closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2024
@praveenkumar
Copy link
Member

@bobbygryzynger I am reopening this issue since we didn't make change in the messaging yet.

It seems to me that the cluster being unstable is worthy of an error, but I won't belabor the point.

At some point I do agree but if we error out and not execute next steps which actually update the kubeconfig with user context then there is no easy way to access the cluster api to debug which clusteroperator is behaving differently and that's the reason I consider this as warning and let user use the API for debugging or may be the operator which is misbehaving is not even required by user then this can be ignored.

@cfergeau
Copy link
Contributor

cfergeau commented Sep 5, 2024

At some point I do agree but if we error out and not execute next steps

I think the request is that when there is this "cluster not ready" message, after crc completes, echo $? should be non-0. I don't think this asks for crc to exit right away, so it could still try to update ~/.kube/config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working status/need triage
Projects
Status: Ready for review
Development

No branches or pull requests

4 participants