unhandled error: tokio-runtime-worker panicked - `Err` value: SSHActivateExit(Some(255)) #287

jficz · 2024-08-13T09:14:45Z

Following a rather large update I got this unhandled error. Unfortunately I was unable to replicate the problem (so far).

Note that I'm doing a local deploy (i.e. laptop is the machine I'm deploying both from and to) but I don't know if (or how) it is possible to do local deploys, well, locally and not over ssh.

user@laptop% deploy -sk --confirm-timeout 1200  .#laptop.system
🚀 ℹ️ [deploy] [INFO] Evaluating flake in .
warning: Git tree '/home/user/git/my-nixos-deployment' is dirty
trace: evaluation warning: The ‘gnome.gnome-keyring’ was moved to top-level. Please use ‘pkgs.gnome-keyring’ directly.
trace: evaluation warning: nixfmt was renamed to nixfmt-classic. The nixfmt attribute may be used for the new RFC 166-style formatter in the future, which is currently available as nixfmt-rfc-style
🚀 ⚠️ [deploy] [WARN] Interactive sudo is enabled! Using a sudo password is less secure than correctly configured SSH keys.
Please use keys in production environments.
🚀 ℹ️ [deploy] [INFO] You will now be prompted for the sudo password for laptop.
(sudo for laptop) Password: 
🚀 ℹ️ [deploy] [INFO] The following profiles are going to be deployed:
[laptop.system]
user = "root"
ssh_user = "user"
path = "/nix/store/...-activatable-nixos-system-laptop-24.11.20240809.5e0ca22"
hostname = "laptop"
ssh_opts = []

🚀 ℹ️ [deploy] [INFO] Building profile `system` for node `laptop`
🚀 ℹ️ [deploy] [INFO] Copying profile `system` to node `laptop`
🚀 ℹ️ [deploy] [INFO] Activating profile `system` for node `laptop`
🚀 ℹ️ [deploy] [INFO] Creating activation waiter
⭐ ℹ️ [activate] [INFO] Activating profile
👀 ℹ️ [wait] [INFO] Waiting for confirmation event...
Copied "/nix/store/...-systemd-256.2/lib/systemd/boot/efi/systemd-bootx64.efi" to "/boot/EFI/systemd/systemd-bootx64.efi".
Copied "/nix/store/...-systemd-256.2/lib/systemd/boot/efi/systemd-bootx64.efi" to "/boot/EFI/BOOT/BOOTX64.EFI".
updating systemd-boot from 255.6 to 256.2
stopping the following units: NetworkManager.service, audit.service, avahi-daemon.service, avahi-daemon.socket, bluetooth.service, cups-browsed.service, cups.service, cups.socket, ensure-printers.service, fwupd.service, kmod-static-nodes.service, logrotate-checkconf.service, mount-pstore.service, network-local-commands.service, network-setup.service, node-red.service, nscd.service, opensnitchd.service, prometheus-node-exporter.service, resolvconf.service, rtkit-daemon.service, systemd-modules-load.service, systemd-oomd.service, systemd-oomd.socket, systemd-sysctl.service, systemd-timesyncd.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-udevd.service, systemd-vconsole-setup.service, systemd-zram-setup@zram0.service, tlp.service, trackpoint.service, udisks2.service, upower.service, wireguard-wg0-peer-server1.service, wireguard-wg0-peer-server2.service, wireguard-wg0.service, wpa_supplicant.service, zfs-mount.service, zfs-share.service, zfs-zed.service
Job for systemd-zram-setup@zram0.service canceled.
NOT restarting the following changed units: greetd.service, systemd-backlight@backlight:amdgpu_bl1.service, systemd-backlight@leds:tpacpi::kbd_backlight.service, systemd-fsck@dev-disk-by\x2duuid-5C85\x2d53D4.service, systemd-journal-flush.service, systemd-logind.service, systemd-random-seed.service, systemd-remount-fs.service, systemd-update-utmp.service, systemd-user-sessions.service, user-runtime-dir@1000.service, user@1000.service
activating the configuration...
[agenix] creating new generation in /run/agenix.d/2
[agenix] decrypting secrets...
decrypting '/nix/store/...-wg-privkey.age' to '/run/agenix.d/2/wg-privkey-laptop'...
[agenix] symlinking new secrets to /run/agenix (generation 2)...
[agenix] removing old secrets (generation 1)...
[agenix] chowning...
setting up /etc...
restarting systemd...
reloading user units for user...
restarting sysinit-reactivation.target
reloading the following units: dbus.service, firewall.service, reload-systemd-vconsole-setup.service
restarting the following units: nix-daemon.service, polkit.service, sshd.service, systemd-journald.service
starting the following units: NetworkManager.service, audit.service, avahi-daemon.socket, bluetooth.service, cnups-browsed.service, cups.socket, ensure-printers.service, fwupd.service, kmod-static-nodes.service, logrotate-checkconf.service, mount-pstore.service, network-local-commands.service, network-setup.service, node-red.service, nscd.service, opensnitchd.service, prometheus-node-exporter.service, resolvconf.service, rtkit-daemon.service, systemd-modules-load.service, systemd-oomd.socket, systemd-sysctl.service, systemd-timesyncd.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-vconsole-setup.service, systemd-zram-setup@zram0.service, tlp.service, trackpoint.service, udisks2.service, upower.service, wireguard-wg0-peer-server1.service, wireguard-wg0-peer-server2.service, wireguard-wg0.service, wpa_supplicant.service, zfs-mount.service, zfs-share.service, zfs-zed.service
🚀 ❌ [deploy] [ERROR] Waiting over SSH resulted in a bad exit code: Some(255)
🚀 ℹ️ [deploy] [INFO] Revoking previous deploys
thread '🚀 ❌ [deploy] [ERROR] Deployment failed, rolled back to previous generation
tokio-runtime-worker' panicked at /build/source/src/deploy.rs:488:41:
called `Result::unwrap()` on an `Err` value: SSHActivateExit(Some(255))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I see similar (time-out-related) issues from time to time when doing updates after longer periods of time but I never got an unhandled error directly from Rust.

I would assume that 20m timeout would be enough but it looks more like that if certain things are updated (probably related to network or ssh), no matter the timeout, it will always fail. Note that in this case as far as I can tell SSH was connecting through [::1]:22.

The text was updated successfully, but these errors were encountered:

freelock · 2024-10-14T17:32:10Z

Hi,

I'm getting something similar trying to deploy to an AWS host.

I do see "Activation succeeded!" but then it times out on the "Waiting for confirmation event..." with a 90s confirm-delay.

Running with RUST_BACKTRACE=1, I'm getting this backtrace:

⭐ ❌ [activate] [ERROR] Failed to get activation confirmation: Error waiting for confirmation event: Timeout elapsed for confirmation
thread 'tokio-runtime-worker' panicked at /build/source/src/deploy.rs:488:41:
called `Result::unwrap()` on an `Err` value: SSHActivateExit(Some(1))
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: tokio::runtime::task::harness::poll_future
   4: tokio::runtime::task::raw::poll
   5: tokio::runtime::task::Notified<S>::run
   6: tokio::runtime::thread_pool::worker::Context::run_task
   7: tokio::runtime::task::raw::poll
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread 'main' panicked at /build/source/src/deploy.rs:523:30:
called `Result::unwrap()` on an `Err` value: RecvError(())
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: deploy::cli::run_deploy::{{closure}}
   4: deploy::cli::run::{{closure}}
   5: deploy::main::{{closure}}
   6: deploy::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unhandled error: tokio-runtime-worker panicked - `Err` value: SSHActivateExit(Some(255)) #287

unhandled error: tokio-runtime-worker panicked - `Err` value: SSHActivateExit(Some(255)) #287

jficz commented Aug 13, 2024

freelock commented Oct 14, 2024

unhandled error: tokio-runtime-worker panicked - Err value: SSHActivateExit(Some(255)) #287

unhandled error: tokio-runtime-worker panicked - Err value: SSHActivateExit(Some(255)) #287

Comments

jficz commented Aug 13, 2024

freelock commented Oct 14, 2024

unhandled error: tokio-runtime-worker panicked - `Err` value: SSHActivateExit(Some(255)) #287

unhandled error: tokio-runtime-worker panicked - `Err` value: SSHActivateExit(Some(255)) #287