Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always enable pthread flags #5196

Closed
wants to merge 2 commits into from

Conversation

fingolfin
Copy link
Member

Resolves #5192

@fingolfin fingolfin added topic: build system release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes labels Nov 7, 2022
configure.ac Show resolved Hide resolved
Copy link
Member

@dimpase dimpase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo the typo, see another comment, all is good

configure.ac Outdated Show resolved Hide resolved
@dimpase dimpase self-requested a review November 7, 2022 13:44
@fingolfin
Copy link
Member Author

Hmmm, seems this breaks the expect test. Someone else will have to look into this, I don't have the time and interest right now

@fingolfin
Copy link
Member Author

The output is not super helpful:

Running test suite testexpect
+ case $TEST_SUITE in
+ INPUTRC=/tmp/inputrc
+ expect -c 'spawn ./gap --quitonbreak -A -b  --cover coverage/testexpect.coverage' /home/runner/work/gap/gap/dev/gaptest.expect
spawn ./gap --quitonbreak -A -b --cover coverage/testexpect.coverage
 TIMEOUT 
Error: Process completed with exit code 2.

Of course on my own Linux machines, it just passes with this PR.

I also verified that -pthread is used both in the CI tests and on my own machines; I also made sure to pass --coverage to compiler and linker alike, just as in the CI environment. We can see the effect in the logs:

checking whether pthreads work with "-pthread" and "-lpthread"... yes
checking for joinable pthread attribute... PTHREAD_CREATE_JOINABLE
checking whether more special flags are required for pthreads... no
checking for PTHREAD_PRIO_INHERIT... yes
...
/bin/bash ./libtool --mode=compile --tag CC gcc  -MQ build/obj/src/ariths.c.lo -MMD -MP -MF build/deps/src/ariths.c.d  -pthread --coverage [...]
...
libtool: compile:  gcc -MQ build/obj/src/ariths.c.lo -MMD -MP -MF build/deps/src/ariths.c.d -pthread --coverage [..]

Yet despite all this, the expect test just passes on my own machines, while it gets stuck inside CI...

@dimpase
Copy link
Member

dimpase commented Sep 2, 2023

I'm trying to reproduce this locally, using act. More specifically, I'm using gh act installed as a gh extension

$ gh extension install https://github.com/nektos/gh-act

and running

$ gh act -p false -r -j test

I beleive this should be as good as doing it remotely, but the Docker containers used are saved locally, so one can in principle easily fire up one of these and run things manually.

@dimpase
Copy link
Member

dimpase commented Sep 2, 2023

OK, I can reproduce this in gh act. Here I added -d to the expect options to tell it to output a bit more.
It looks as if GAP never comes to gap> prompt.

| Running test suite testexpect
| + case $TEST_SUITE in
| + INPUTRC=/tmp/inputrc
| + expect -d -c 'spawn ./gap --quitonbreak -A -b  --cover coverage/testexpect.coverage' /home/dimpase/work/software/gap/dev/gaptest.expect
| expect version 5.45.4
| spawn ./gap --quitonbreak -A -b --cover coverage/testexpect.coverage
| parent: waiting for sync byte
| parent: telling child to go ahead
| parent: now unsynchronized from child
| spawn: returns {7327}
| argv[0] = expect  argv[1] = -d  argv[2] = -c  argv[3] = spawn ./gap --quitonbreak -A -b  --cover coverage/testexpect.coverage  argv[4] = /home/dimpase/work/software/gap/dev/gaptest.expect  
| set argc 0
| set argv0 "/home/dimpase/work/software/gap/dev/gaptest.expect"
| set argv ""
| executing commands from command file /home/dimpase/work/software/gap/dev/gaptest.expect
| 
| expect: does "" (spawn_id exp4) match glob pattern "gap> "? no
| expect: timed out
|  TIMEOUT 
[CI/testexpect -  - ubuntu-latest]   ❌  Failure - Main Run tests
[CI/testexpect -  - ubuntu-latest] exitcode '2': failure

I've modified the CI.yml as follows, to only run testexpect.

--- a/.github/workflows/CI.yml
+++ b/.github/workflows/CI.yml
@@ -44,101 +44,9 @@ jobs:
         shell: [bash]
         test-suites:
           [
-            # base test: fast first test
-            "testinstall",
-
-            "teststandard",
-
-            # run tests contained in the manual
-            "testmanuals",
-
             # test error reporting and compiling as well as libgap
-            "testexpect testmockpkg testspecial test-compile testlibgap testkernel",
-
-            # compile packages and run GAP tests
-            # don't use --enable-debug to prevent the tests from taking too long
-            "testpackages testinstall-loadall",
+            "testexpect",
           ]
-        extra: [""]
-
-        # add a few extra tests
-        include:
-          # test creating the manual
-          # also test `make install` here, as that wants the manual, too;
-          # and set NO_COVERAGE=1 to ensure the installed GAP binaries don't
-          # contain references to the build directory
-          - os: ubuntu-latest
-            shell: bash
-            test-suites: "makemanuals testmakeinstall"
-            extra: "GAPPREFIX=/tmp/gapprefix CONFIGFLAGS=\"--prefix=/tmp/gapprefix\" NO_COVERAGE=1"
-
-          - os: ubuntu-20.04
-            shell: bash
-            test-suites: "teststandard"
-            extra: "ABI=32 CONFIGFLAGS=\"\""
-
-          # FIXME: we used to run `teststandard` for HPC-GAP under Travis CI,
-          # but somehow when running on GitHub Actions, it takes almost 4
-          # hours (!) to complete instead of 25 minutes. So for now we just
-          # run testinstall.
-          - os: ubuntu-latest
-            shell: bash
-            test-suites: "testinstall"
-            extra: "HPCGAP=yes ABI=64"
-
-          # this job also tests GAP without readline
-          - os: macos-latest
-            shell: bash
-            test-suites: "testmockpkg testinstall"
-            extra: "BOOTSTRAP_MINIMAL=yes"
-
-          # run bugfix regression tests
-          # Also turn on '--enable-memory-checking' to make sure GAP compiles
-          # with the flag enabled. We do not actually test the memory
-          # checking, as this slows down GAP too much.
-          # Also check ubuntu-latest works.
-          - os: ubuntu-latest
-            shell: bash
-            test-suites: "testbugfix"
-            extra: "CONFIGFLAGS=\"--enable-memory-checking\""
-
-          # out of tree builds -- these are mainly done to verify that the
-          # build system work in this scenario. Since we don't expect the test
-          # results to vary compared to the in-tree builds, we turn off
-          # coverage reporting by setting NO_COVERAGE=1; this has the extra
-          # benefit of also running the tests at least once with the
-          # ReproducibleBehaviour option turned off.
-
-          # The '--enable-valgrind' checks that GAP builds and runs correctly
-          # when compiled with valgrind support. We do not actually run any
-          # tests using valgrind, as it is too slow.
-          - os: ubuntu-latest
-            shell: bash
-            test-suites: "testbuildsys testmockpkg testinstall"
-            extra: "NO_COVERAGE=1 ABI=64 BUILDDIR=out-of-tree
-                    CONFIGFLAGS=\"--enable-valgrind\""
-            packages: "valgrind"
-
-          # same as above, but in 32 bit mode, also turn off debugging (see
-          # elsewhere in this file for an explanation).
-          - os: ubuntu-20.04
-            shell: bash
-            test-suites: "testbuildsys testmockpkg testinstall"
-            extra: "NO_COVERAGE=1 ABI=32 BUILDDIR=out-of-tree CONFIGFLAGS=\"\""
-
-          # test Julia integration
-          - os: ubuntu-20.04
-            shell: bash
-            test-suites: "testinstall"
-            extra: "JULIA=yes CONFIGFLAGS=\"--enable-debug --disable-Werror\""
-
-          - os: windows-2019
-            # The 'run' steps in this job use Cygwin's bash, once it is set up.
-            #   --login: make a login shell (so PATH is set up)
-            #   -o igncr: Accept windows line endings
-            #   {0} : Pass any extra arguments from CI
-            shell: C:\cygwin64\bin\bash.exe --login -o igncr '{0}'
-            test-suites: "testmockpkg testinstall"
 
     env:
       TEST_SUITES: ${{ matrix.test-suites }}

@ChrisJefferson
Copy link
Contributor

I believe this is fixed now. I found this fix completely by accident while fiddling around using gh act (which is a nice thing to learn about). Turns out turning on pthreads is somehow making github CI take a REALLY long time to start GAP, so we need a huge timeout.

Copy link
Member

@dimpase dimpase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also passes in my gh act.

@james-d-mitchell
Copy link
Contributor

Thanks everyone, much appreciated!

@@ -1,4 +1,4 @@
set timeout 10
set timeout 60
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's slow. Do you have any idea what it is doing in this time? I.e. what makes it so slow?

I tried time ./gap -A -c 'QUIT;' on a bunch of machine just now, with HPC-GAP and debug mode enabled, and it still starts within ~2 seconds on the slowest I could find (WSL inside a Windows VM).

We've also in the past disabled some HPC-GAP tests because they were so super slow in the VM and could never figure out why... This seems to be related?

@fingolfin
Copy link
Member Author

While the testexpect issue is "resolved" (yay!) this still can't be merged as now three other CI jobs fail due to timeouts: whereas they normally take 25-30 minutes, they now did not finish after 1 hour. Something fishy is going on here.

I'll force another CI run to see if this is persistent (even if all tests passt, please do not merge blindly but rather let's verify CI job times have not increased a lot)

@dimpase
Copy link
Member

dimpase commented Sep 4, 2023

the order of the libraries linked/loaded might play a role. We recently saw weirdness related to this in sagemath/sage#31572

@dimpase
Copy link
Member

dimpase commented Sep 4, 2023

and here sagemath/sage#34850

@dimpase
Copy link
Member

dimpase commented Sep 4, 2023

it can be specific to glibc. Do you see such a slowdown in macOS instead of ubuntu-latest? or on Void linux (which has musl libc) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes topic: build system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

in some cases, GAP must be built with -pthread in LDFLAGS
4 participants