Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures for 1.5.0 on Alpine Linux #90

Open
orhun opened this issue Jul 24, 2023 · 20 comments
Open

Test failures for 1.5.0 on Alpine Linux #90

orhun opened this issue Jul 24, 2023 · 20 comments

Comments

@orhun
Copy link
Contributor

orhun commented Jul 24, 2023

Hello! 🐻

I've been trying to package tere for Alpine Linux and I created a merge request here: https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/49128

As you can see from the failed pipelines, some tests are failing:

failures:
---- app_state::tests::test_case_sensitive_mode_change stdout ----
thread 'app_state::tests::test_case_sensitive_mode_change' panicked at 'assertion failed: `(left == right)`
  left: `[2]`,
 right: `[1]`', src/app_state.rs:1711:9
failures:
    app_state::tests::test_case_sensitive_mode_change
test result: FAILED. 90 passed; 1 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.05s

Also, I get the following failures even though I skip that one test:

failures:

---- panic_guard::tests::test_callback_called_before_panic_hook stdout ----
thread 'panic_guard::tests::test_callback_called_before_panic_hook' panicked at 'assertion failed: `(left == right)`
  left: `["cleanup", "hook", "hook", "hook"]`,
 right: `["cleanup", "hook"]`', src/panic_guard.rs:135:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- panic_guard::tests::test_callback_called_once_only_panic stdout ----
thread 'panic_guard::tests::test_callback_called_once_only_panic' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', src/panic_guard.rs:99:36

---- panic_guard::tests::test_nested_callback stdout ----
thread 'panic_guard::tests::test_nested_callback' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', src/panic_guard.rs:140:36

---- panic_guard::tests::test_nested_callback_hook_restored stdout ----
thread 'panic_guard::tests::test_nested_callback_hook_restored' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', src/panic_guard.rs:180:36

---- panic_guard::tests::test_nested_callback_with_panic stdout ----
thread 'panic_guard::tests::test_nested_callback_with_panic' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', src/panic_guard.rs:161:36


failures:
    panic_guard::tests::test_callback_called_before_panic_hook
    panic_guard::tests::test_callback_called_once_only_panic
    panic_guard::tests::test_nested_callback
    panic_guard::tests::test_nested_callback_hook_restored
    panic_guard::tests::test_nested_callback_with_panic

test result: FAILED. 85 passed; 5 failed; 1 ignored; 0 measured; 1 filtered out; finished in 0.05s

error: test failed, to rerun pass '--bin tere'

I am skipping these tests for now since I am not sure what is happening there.

Weirdly enough, all the tests (including those ones) pass on Arch Linux.

Let me know if I can provide more info to debug this.

Thanks for this great tool! 💯

@mgunyho
Copy link
Owner

mgunyho commented Jul 24, 2023

Hmm, interesting. I can't tell offhand what is causing those, but probably something to do with threading; the panic_guard tests should not run in parallel (they modify the global panic hook), so they use a mutex.

About the first failure, is it somehow possible that files in Alpine are not case-sensitive? Although that sounds like something very fundamental to Unix that shouldn't be different.

@orhun
Copy link
Contributor Author

orhun commented Jul 24, 2023

ping @nekopsykose (Alpine Linux maintainer), maybe they know something about it :/

@nekopsykose
Copy link

About the first failure, is it somehow possible that files in Alpine are not case-sensitive?

that's a filesystem thing and nothing in this case has casefolding

@nekopsykose
Copy link

running them locally they all pass, funny..

@nekopsykose
Copy link

ah, it's flaky. it toggles between:

  • all pass
  • these fail:
    failures:
      panic_guard::tests::test_nested_callback_hook_restored
      panic_guard::tests::test_nested_callback_with_panic
    
  • these:
    failures:
      panic_guard::tests::test_callback_called_before_panic_hook
      panic_guard::tests::test_callback_called_on_drop
      panic_guard::tests::test_callback_called_once_only_panic
      panic_guard::tests::test_nested_callback
      panic_guard::tests::test_nested_callback_hook_restored
      panic_guard::tests::test_nested_callback_with_panic
    

this just sounds like some ordering that is broken

@nekopsykose
Copy link

ah. the reason they cascade like that is because your mutex doesn't unlock on fail. so whatever test fails first, causes the rest to fail on lock (poisoned dead mutex)

the first failure is always

---- panic_guard::tests::test_callback_called_before_panic_hook stdout ----
thread 'panic_guard::tests::test_callback_called_before_panic_hook' panicked at 'assertion failed: `(left == right)`
  left: `["cleanup", "hook", "hook", "hook"]`,
 right: `["cleanup", "hook"]`', src/panic_guard.rs:135:9
--- panic_guard::tests::test_nested_callback_hook_restored stdout ----
thread 'panic_guard::tests::test_nested_callback_hook_restored' panicked at 'assertion failed: `(left == right)`
  left: `["inner inner cleanup", "inner cleanup", "outer hook", "outer hook"]`,
 right: `["inner inner cleanup", "inner cleanup", "outer hook"]`', src/panic_guard.rs:214:9

etc. seems that something races appending

@nekopsykose
Copy link

i don't know how to fix that, but maybe you have a clue.

your mutex doesn't unlock on fail

the default Mutex cannot unlock on a thread panic (what your failures do), it always gets poisoned. you should see if there's another implementation :)

@mgunyho
Copy link
Owner

mgunyho commented Jul 24, 2023

Thanks! Yeah, the poisoning is kind of on purpose, if it happens then something is wrong anyway so I didn't bother cleaning it up.

I don't really know what's happening there that's causing the callbacks to be called multiple times. Maybe I'm doing something stupid with the locking/calls mutex inside the callbacks.

@nekopsykose
Copy link

Yeah, the poisoning is kind of on purpose, if it happens then something is wrong anyway so I didn't bother cleaning it up.

that makes sense. just a little noisy, but the first failure is all that mattered anyway.

@mgunyho
Copy link
Owner

mgunyho commented Apr 5, 2024

Could this be resolved now? I believe the panic_guard tests shouldn't fail anymore after an update to rustc, and app_state::tests::test_case_sensitive_mode_change should have been fixed in 8cadf56.

@orhun
Copy link
Contributor Author

orhun commented Apr 5, 2024

Are you planning to create a release soon? It would be nice to activate the tests again along with it 😊

@mgunyho
Copy link
Owner

mgunyho commented Apr 6, 2024

Ah yes I didn't realize that I would need to make a new release. However, these panic guard test failures still seem to appear with rustc version 1.77, both on develop and master 😩 I will really need to try and address this.

I can only reproduce this on my work computer (ubuntu 22) but not on my laptop (fedora 39), and it only happens occasionally. It happens a little bit more often if I force a re-link by removing the unit test binary, like

rm target/debug/deps/tere-25a48097920d5416 && cargo test

(the name of the binary is shown in the first few lines of the output of cargo test)

This suggests that it can have something to do with the CPU being busy, which would make sense as this seems to be due to a race condition.

@mgunyho
Copy link
Owner

mgunyho commented Sep 15, 2024

@timon-schelling re: #104 (comment)

Do you know what the environment (distro / rustc version) in the codespace is? I would like to investigate this if it can be reliably reproduced. I tried now on Debian 11 (in a container) and there at least the issue doesn't appear.

@mgunyho
Copy link
Owner

mgunyho commented Sep 15, 2024

I tried this also with an Alpine Linux container (using Distrobox) and I can't reproduce it there either (on my laptop, which I mentioned above that I couldn't reproduce anyway).

@mgunyho
Copy link
Owner

mgunyho commented Sep 15, 2024

Okay, I was able to confirm that it's a data race issue by running the thread sanitizer. I installed the nightly toolchain, and then run with

RUSTFLAGS="-Zsanitizer=thread" cargo +nightly test --target=x86_64-unknown-linux-gnu

and it reports data races in those tests. That shouldn't happen in safe Rust...

@timon-schelling
Copy link
Contributor

I'm unable to reproduce this, tried on multiple codespaecs and my nix machine.
But the codespace I head the issue on head rustc 1.83.0-nightly (c2f74c3f9 2024-09-09).

@timon-schelling
Copy link
Contributor

and it reports data races in those tests. That shouldn't happen in safe Rust...

I guess a library that is using some unsafe code dose not have a save api?

@timon-schelling
Copy link
Contributor

RUSTFLAGS="-Zsanitizer=thread" cargo +nightly test --target=x86_64-unknown-linux-gnu

But I don't see any non std or tere line in the warning. 🤔

@mgunyho
Copy link
Owner

mgunyho commented Sep 15, 2024

The panic_guard module only uses std::sync::Arc and Mutex. I don't exactly know how the test are run, but I suppose messing with the panic hook in multiple tests running in parallel is pretty unusual, and maybe during a panic (where I'm doing stuff) some of the safety guarantees don't apply.

But I will try to investigate with an MWE to see if this is truly a bug in std.

@mgunyho
Copy link
Owner

mgunyho commented Sep 23, 2024

I looked into this again. Basically, tests run in parallel and concurrently modify the panic hook, see for example these discussions:

If I restrict the tests to a single thread with cargo test -- --test-threads 1 or run only tests in the panic_guard module which is using the mutex to avoid data races, the issue doesn't occur. So the only way to avoid this would be to have a mutex across all tests, which is not feasible.

At least it seems like the issue is with the tests and not with my code being tested, which is a relief. But I don't know of a good way to resolve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants