-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Deployment error #1176
Comments
Howdy 🖐 rumart ! Thank you for your interest in this project. We value your feedback and will respond soon. |
Hi @rumart, the |
Yeah, so when I comment out setup-04 it doesn't pick up on the BOM variable, but nevertheless, since it get's defined in setup-05 could it just be moved up a bit? Or should it be removed altogether? Thanks for looking into it |
I don't think that the issue is caused by not setting the |
I agree, the VEBA_BOM_FILE issue is because I've re-run the script without running the setup-04 which sets it the first time. Was more thinking of fixing that setup-05 file separately.. Anyways, I'm running it on a small home lab vSAN cluster. Have tried redeploying a few times, all stopping on the same error message. I'll try to run it on a different env later tonight to see if that changes anything |
I've tried on a single ESXi host not running anything else, storage on NVME. I've added more CPU and RAM to the appliance. Still errors out on the same step I ssh'd to the appliance as soon as it was available and tailed the bootstrap-debug.log. The error |
IIRC, the 10 minutes are the default for the |
I suspect that the current "wait" conditions are actually passing, unless you login and it looks to be waiting for default 10m as mentioned by Robert. If it truly is a timing, we can always enhance the OVF properties to allow that to be customizable but I'm not sure if thats actually the case and we may need some other wait condition. If we can debug this further Robert, then we can spin up a custom build to verify for @rumart |
Just as @rumart, first error I got:
Second try, I increased the timeout value and kept going. Third try stumbled upon the following:
Which had to work around to keep the installation going. |
@rumart I owe you a deep apology for not getting back to you earlier. Would you be open to troubleshoot your issue further? I've just added another
From there you can perfectly follow the progress. The new build can be downloaded for testing purposes here: DOWNLOAD |
Thanks @rguske. I've been busy with other things so haven't had the time myself. |
Sure, just let me know when you have the time and ping me on Discord or Slack (CNCF Workspace). Looking forward finding the rc. |
Seems I cannot download the testversion..
… On 13 Jun 2024, at 08:58, Robert Guske ***@***.***> wrote:
Thanks @rguske <https://github.com/rguske>. I've been busy with other things so haven't had the time myself. I'm very interested in troubleshooting further and get this up and running.
Sure, just let me know when you have the time and ping me on Discord or Slack (CNCF Workspace). Looking forward finding the rc.
—
Reply to this email directly, view it on GitHub <#1176 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADIR6R7QM6CG4N3SMZCO7HLZHE7J5AVCNFSM6AAAAABJF4UKS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRUG4YTCMRQHA>.
You are receiving this because you were mentioned.
|
I've authorized you now 👍🏻 |
Just to add in, yesterday, we were on vCenter 7.0.3 and I was able to deploy. Today, after an update to vCenter 8.0.2, I get the same error as @rumart. rabbitmqcluster.rabbitmq.com/veba-rabbit created
Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.98.98.40:443: connect: connection refused |
Thanks a lot for your input @benwa. I don't think this issue is related to the vSphere version, since the first "real" interaction with the vCenter Server is at line 22 in script 06. when the |
Welp, I redownloaded the ova from the Flings site and ran a checksum. It was different. Redeployed and I'm all good now. |
Eh… I still can’t deploy it. Even with a new test version provided by @rguskeOn 26 Jun 2024, at 18:29, William Lam ***@***.***> wrote:
Closed #1176 as completed.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Issue still exists. |
Now I'm able to deploy successfully. Tested several times without issues |
Interesting! Thanks lot for verifying Rudi. However, I will try to narrow it down. There must be different way. |
First time VEBA user eager to get this working, but I'm also experencing this issue /var/log/bootstrap-debug.log
|
Thanks for reporting it. Could you please try the version provided in this comment HERE? Thy |
That link doesn't work anymore. Google Drive says: Sorry, the file you have requested does not exist. Make sure that you have the correct URL and the file exists. |
I will provide a new link in a bit. I was on vacation and back on the issue now. The issue looks similar to what is described here: https://cert-manager.io/docs/troubleshooting/webhook/ So, it looks to me that the Kubernetes API server is trying to call the Even tough, the following is included in our script which should ensure that everything is in
|
@royiversen78 use this LINK temporarily. |
I'm getting the same issue with this version Error from server (InternalError): error when creating "/root/config/knative/rabbit.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.rabbitmq.eventing.knative.dev": failed to call webhook: Post "https://rabbitmq-broker-webhook.knative-eventing.svc:443/defaulting?timeout=2s": dial tcp 10.108.11.231:443: connect: connection refused |
Add a pause to the 05-knative.sh as a workaround for #1176
@rumart @royiversen78 we added a pause to the installation to ensure service dependencies and availabilities. If you'd like to test its functionality, please DM me (preferred on CNCF Slack) and I will provide you a download link to the OVA. |
The 15 second sleep fix worked for me. It was a battle to get the OVA rebuilt with the fix, but once the rebuilt OVA was used, the VEBA completed first boot configuration successfully. |
Describe the bug
The VEBA deployment doesn't finish and throws an error when deploying the RabbitMQ cluster
To Reproduce
Steps to reproduce the behavior:
I've deployed the OVA as described in the docs
Waited for around 20 minutes, but none of the web endpoints work (Connection refused)
Expected behavior
The deployment to finish and the endpoints to work
Screenshots
Screenshot of bootstrap-debug.log
Version (please complete the following information):
Additional context
When troubleshooting I saw that the deployment stopped in what seems to be setup-05-knative.sh script.
I commented out scripts 1 through 4 in setup.sh and reran setup.sh
After a short while the script stopped with this message:
Checked the setup-05-knative.sh script and found that the VEBA_BOM_FILE variable was defined after it being used in the file
The ytt command on line 44 uses $VEBA_BOM_FILE, but the variable is first defined on line 51.
I moved that line above line 44 and reran setup.sh
Now the deployment could finish and I can access the web endpoints
The text was updated successfully, but these errors were encountered: