Add a test timeout #779

ernestum · 2023-09-08T11:30:48Z

Description

Adds timeout of 590 seconds to the tests. This should abort tests before CircleCI interrupts the pipeline thereby allowing us to see which test took so long.

ALSO:

Fix notebook running error. #790 and #779.

docs/tutorials/3_train_gail.ipynb: 800k can bring running time exceed error
docs/tutorials/4_train_airl.ipynb: env.reset doesn't require seed in gymnasium setting.
docs/tutorials/5a_train_preference_comparisons_with_cnn.ipynb: migrate to gymnasium
docs/tutorials/8a_train_sqil_sac.ipynb: v0 is not supported.

Testing

Only test parameter changes.

AdamGleave · 2023-09-08T16:27:55Z

Thanks for this!

Looks like one of the Ray tests is failing. That's not too surprising, I think Ray can be buggy on Windows, we may want to consider just skipping that test on Windows

ernestum · 2023-09-08T17:02:02Z

Good point. Added a skipif on the test with the timeout.

AdamGleave · 2023-09-11T00:32:57Z

Looks like its timing out in test_train_preference_comparisons_main. Odd since stdout/stderr suggests that training completes so I think it must be hanging on something after that. Maybe in writing TensorBoard logs as some threads seem to be stuck there?

Skipping that test as well would be OK as a hotfix, but if can replicate that test failure via running as SSH can hopefully determine a clean resolution.

ernestum · 2023-09-13T09:57:40Z

Possibly related: https://discuss.pytorch.org/t/got-stuck-at-tensorboardx-event-file-writer-py/112209

I updated to the newest tensorboard/tensorboardX version in the SSH session and re-ran the tests. Now there is no timeout any more but the tests fail due to a different reason:

Some file is not found:

FileNotFoundError: [Errno 2] No such file or directory: b'C:\\Users\\circleci.PACKER-64C83029\\AppData\\Local\\Temp\\pytest-of-circleci\\pytest-3\\test_train_preference_comparis0\\train_preference_comparisons\\CartPole-v1\\20230913_094320_90ea37\\log\\events.o
ut.tfevents.1694598202.packer-64c83029-cd75-bf3a-2e04-0f23beb7d9d9.7076.0'

(click to expand)


___________ test_train_preference_comparisons_main[plain] ___________ 
tmpdir = local('C:\Users\circleci.PACKER-64C83029\AppData\Local\Temp\pytest-of-circleci\pytest-3\test_train_preference_comparis0'), preference_comparison_config = {}
def test_train_preference_comparisons_main(tmpdir, preference_comparison_config): 
    config_updates = dict(logging=dict(log_root=tmpdir))
    sacred.utils.recursive_update(config_updates, preference_comparison_config)


  run = train_preference_comparisons.train_preference_comparisons_ex.run(


        # Note: we have to use the cartpole and not the seals_cartpole config because
        #  the seals_cartpole config needs rollouts with a fixed horizon,
        #  and the saved trajectory rollouts are variable horizon.
        named_configs=["cartpole"] + ALGO_FAST_CONFIGS["preference_comparison"],
        config_updates=config_updates,
    )

tests\scripts\test_scripts.py:160:

venv\lib\site-packages\sacred\experiment.py:276: in run

run()

venv\lib\site-packages\sacred\run.py:238: in call

self.result = self.main_function(*args)

venv\lib\site-packages\sacred\config\captured_function.py:42: in captured_function

result = wrapped(*args, **kwargs)

venv\lib\site-packages\imitation\scripts\train_preference_comparisons.py:167: in train_preference_comparisons

custom_logger, log_dir = logging_ingredient.setup_logging()

venv\lib\site-packages\sacred\config\captured_function.py:42: in captured_function

result = wrapped(*args, **kwargs)

venv\lib\site-packages\imitation\scripts\ingredients\logging.py:114: in setup_logging

custom_logger = imit_logger.configure(

venv\lib\site-packages\imitation\util\logger.py:413: in configure

output_formats = _build_output_formats(folder, format_strs)

venv\lib\site-packages\imitation\util\logger.py:67: in _build_output_formats

output_formats.append(make_output_format(f, str(folder)))

venv\lib\site-packages\imitation\util\logger.py:44: in make_output_format

return sb_logger.make_output_format(_format, log_dir, log_suffix)

venv\lib\site-packages\stable_baselines3\common\logger.py:462: in make_output_format

return TensorBoardOutputFormat(log_dir)

venv\lib\site-packages\stable_baselines3\common\logger.py:398: in init

self.writer = SummaryWriter(log_dir=folder)

venv\lib\site-packages\torch\utils\tensorboard\writer.py:247: in init

self._get_file_writer()

venv\lib\site-packages\torch\utils\tensorboard\writer.py:277: in _get_file_writer

self.file_writer = FileWriter(

venv\lib\site-packages\torch\utils\tensorboard\writer.py:76: in init

self.event_writer = EventFileWriter(

venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:100: in init

self.flush()

venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:125: in flush

self._async_writer.flush()

venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:194: in flush

self._check_worker_status()

venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:212: in _check_worker_status

raise exception

C:\Python38\lib\threading.py:932: in _bootstrap_inner

self.run()

venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:244: in run

self._run()

venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:275: in _run

self._record_writer.write(data)

venv\lib\site-packages\tensorboard\summary\writer\record_writer.py:40: in write

self._writer.write(header + header_crc + data + footer_crc)

venv\lib\site-packages\tensorboard\compat\tensorflow_stub\io\gfile.py:768: in write

self.fs.write(self.filename, file_content, self.binary_mode)

venv\lib\site-packages\tensorboard\compat\tensorflow_stub\io\gfile.py:157: in write

self._write(filename, file_content, "wb" if binary_mode else "w")

self = <tensorboard.compat.tensorflow_stub.io.gfile.LocalFileSystem object at 0x000002B784756550>

filename = b'C:\Users\circleci.PACKER-64C83029\AppData\Local\Temp\pytest-of-circleci\pytest-3\test_train_preference_compa...ole-v1\20230913_094320_90ea37\log\events.out.tfevents.1694598202.packer-64c83029-cd75-bf3a-2e04-0f23beb7d9d9.7076.0'

file_content = b'H\x00\x00\x00\x00\x00\x00\x00\xfe\x92H\x9d\te\xfd\x80\x0ea@\xd9A\x1a\rbrain.Event:2R.\n,tensorboard.summary.writer.event_file_writer\xc8p\xa0\xa0', mode = 'wb'
def _write(self, filename, file_content, mode):
    encoding = None if "b" in mode else "utf8"


  with io.open(filename, mode, encoding=encoding) as f:


E       FileNotFoundError: [Errno 2] No such file or directory: b'C:\Users\circleci.PACKER-64C83029\AppData\Local\Temp\pytest-of-circleci\pytest-3\test_train_preference_comparis0\train_preference_comparisons\CartPole-v1\20230913_094320_90ea37\log\events.o

ut.tfevents.1694598202.packer-64c83029-cd75-bf3a-2e04-0f23beb7d9d9.7076.0'

venv\lib\site-packages\tensorboard\compat\tensorflow_stub\io\gfile.py:171: FileNotFoundError ---- Captured log call ---- WARNING rl:initialize.py:217 Added new config entry: "rl_kwargs.batch_size" WARNING rl:initialize.py:217 Added new config entry: "rl_kwargs.n_epochs" WARNING train_preference_comparisons:run.py:458 No observers have been added to this run ERROR train_preference_comparisons:run.py:392 Failed after 0:00:00!

A manual check revealed, that the path DOES exist 🤷

ZiyueWang25 · 2023-10-01T15:51:53Z

I double checked. The file indeed doesn't exists, the directory exists. So I experiment a bit and found the issue:

linux file path limit 4096
windows file path limit 256

It turns out the file length is right on the limit.

>>> len("C:\\Users\\circleci.PACKER-64C83029\\AppData\\Local\\Temp\\pytest-of-circleci\\pytest-4\\test_train_preference_comparis0\\train_preference_comparisons\\CartPole-v1\\20231001_144824_a538c8\\log\\events.out.tfevents.1696171705.packer-64c83029-cd75-bf3a-2e04-0f23beb7d9d9.6640.0")
262

So I am checking whether removing the path limit this way can work or not.

ZiyueWang25 · 2023-10-01T17:21:33Z

The ModuleNotFoundError: No module named 'gymnasium' error pops up weirdly. Not sure why it happened after this change.
If I do pip install gymnasium==0.28.1 and then run the test pytest tests/algorithms/test_dagger.py::test_simple_dagger_space_mismatch_error[None-False] , it shows error:

    def check_for_correct_spaces(env: GymEnv, observation_space: spaces.Space, action_space: spaces.Space) -> None:
        """
        Checks that the environment has same spaces as provided ones. Used by BaseAlgorithm to check if
        spaces match after loading the model with given env.
        Checked parameters:
        - observation_space
        - action_space

        :param env: Environment to check for valid spaces
        :param observation_space: Observation space to check against
        :param action_space: Action space to check against
        """
        if observation_space != env.observation_space:
>           raise ValueError(f"Observation spaces do not match: {observation_space} != {env.observation_space}")
E           ValueError: Observation spaces do not match: Box([-1. -1. -8.], [1. 1. 8.], (3,), float32) != Box([-1. -1. -8.], [1. 1. 8.], (3,), float32)

ZiyueWang25 · 2023-10-01T18:45:42Z

so the original error is gone. I noticed two new issues here:

#791 & #790

Note: #790 can represent the issues for:

FAILED tests/test_examples.py::test_run_tutorial_notebooks[C:\\Users\\circleci.PACKER-64C83029\\project\\tests\\..\\docs\\tutorials\\5a_train_preference_comparisons_with_cnn.ipynb]
FAILED tests/test_examples.py::test_run_tutorial_notebooks[C:\\Users\\circleci.PACKER-64C83029\\project\\tests\\..\\docs\\tutorials\\4_train_airl.ipynb]
FAILED tests/test_examples.py::test_run_tutorial_notebooks[C:\\Users\\circleci.PACKER-64C83029\\project\\tests\\..\\docs\\tutorials\\8a_train_sqil_sac.ipynb]
FAILED tests/test_examples.py::test_run_tutorial_notebooks[C:\\Users\\circleci.PACKER-64C83029\\project\\tests\\..\\docs\\tutorials\\3_train_gail.ipynb

And #791 is for:

FAILED tests/algorithms/test_sqil.py::test_sqil_performance_continuous[DDPG]

This reverts commit a83df13, reversing changes made to afdba28.

ZiyueWang25 · 2023-10-02T16:21:57Z

So now the issue is windows platform-wise issue:

need to fix the numpy randint bound issue: high is out of bounds for int32
need to pip install gymnasium[mujoco]

ZiyueWang25 · 2023-10-02T16:52:00Z

The numpy issue can be replicated

Hopefully it can be resolved by specifying the seed.

ZiyueWang25 · 2023-10-02T17:48:09Z

So it shows a timeout issue saying that tests/test_examples.py::test_run_tutorial_notebooks[C:\\Users\\circleci.PACKER-64C83029\\project\\tests\\..\\docs\\tutorials\\4_train_airl.ipynb] cannot pass within 5 minutes. It can pass on my computer within 3 minutes. I think this is due to win32 has less resources. The current solution is to make the timeout larger.

Created feature request for this: #793

ZiyueWang25 · 2023-10-02T18:03:50Z

The numpy issue can be replicated

Hopefully it can be resolved by specifying the seed.

Actually this is solved in stable_baselines3 here, I think we need to set stable-baselines3~=2.0 to stable-baselines3>=2.0 to keep the package updated.

codecov · 2023-10-02T18:23:51Z

Codecov Report

Merging #779 (d5026b6) into master (8db6bfb) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head d5026b6 differs from pull request most recent head 99cd7fd. Consider uploading reports for the commit 99cd7fd to get more accurate results

@@           Coverage Diff           @@
##           master     #779   +/-   ##
=======================================
  Coverage   96.33%   96.33%           
=======================================
  Files          98       98           
  Lines        9172     9177    +5     
=======================================
+ Hits         8836     8841    +5     
  Misses        336      336

Files	Coverage Δ
tests/algorithms/test_sqil.py	`100.00% <100.00%> (ø)`
tests/scripts/test_scripts.py	`100.00% <100.00%> (ø)`
tests/test_examples.py	`100.00% <ø> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

AdamGleave

Thanks for these changes! Overall looks almost ready, a few clarifying questions & suggestions, please tag me for a re-review once addressed.

AdamGleave · 2023-10-03T00:43:51Z

.circleci/config.yml

@@ -176,13 +176,28 @@ commands:
            .\venv\Scripts\activate
            pip install --upgrade --force-reinstall --no-deps .
          shell: powershell.exe
+
+      - run:
+          name: install gymansium[mujoco]


Why do we need to explicitly install Gymnasium? Shouldn't this dependency be picked up by pip install . above?

Also confused why we need the MuJoCo dependency -- do any of our tests use MuJoCo?

The main thing is about the mujoco part. It gets involved in the windows environment from import seals in the notebook and mainly here. If I don't do this, the notebook will fail.

It is directly mentioned in Seals as well. Not quite sure why it shows up only in the Windows environment and only in notebook.

I did some digging here and the TL;DR is that the notebooks were only getting tested on Windows. The SQIL notebook depends on HalfCheetah which is a MuJoCo environment, so it makes sense we'd need thiis dependency. We made a decision a while back to have MuJoCo dependency be optional for the codebase since it's large and can be somewhat tricky to install, so I'm loathe to reverse that especially just special-casing it for the Windows CI run. I'd suggest we instead switch out HalfCheetah for something like MountainCarContinuous that does not require extra dependencies -- tracking in #798

Note the gym.register line here should succeed even if MuJoCo is not present -- it will only fail when attempting to instantiate those environments. Indeed,

This doesn't error on unit-test-linux since we're skipping all notebooks: https://github.com/HumanCompatibleAI/imitation/blob/master/tests/test_examples.py#L43 This was because they were meant to be run in readthedocs after #565 This run is happening but the notebooks failing is just a warning:

/home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/docs/tutorials/8a_train_sqil_sac.ipynb: WARNING: Executing notebook failed: CellExecutionError [mystnb.exec] /home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/docs/tutorials/8a_train_sqil_sac.ipynb: WARNING: Notebook exception traceback saved in: /home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/_readthedocs/html/reports/tutorials/8a_train_sqil_sac.err.log [mystnb.exec]

I've opened #799 to track this

Mac we are not running test_examples.py either due to a separate bug, I'll address in #797

Thank you for the detailed explanation! That makes a lot sense!

.circleci/config.yml

docs/tutorials/4_train_airl.ipynb

tests/algorithms/test_sqil.py

tests/test_examples.py

AdamGleave

The notebooks have a bunch of spurious newlines introduced -- can you please clean up that diff as it will otherwise make commit history/git blame messy in future.

MuJoCo dependency I think should not be necessary although I now understand (see comment) why test was failing without it -- this has unrooted a deeper problem in our CI that I'll get the team to work on addressing in other PRs/issues.

AdamGleave · 2023-10-04T00:22:21Z

.circleci/config.yml

@@ -176,13 +176,28 @@ commands:
            .\venv\Scripts\activate
            pip install --upgrade --force-reinstall --no-deps .
          shell: powershell.exe
+
+      - run:
+          name: install gymansium[mujoco]


I did some digging here and the TL;DR is that the notebooks were only getting tested on Windows. The SQIL notebook depends on HalfCheetah which is a MuJoCo environment, so it makes sense we'd need thiis dependency. We made a decision a while back to have MuJoCo dependency be optional for the codebase since it's large and can be somewhat tricky to install, so I'm loathe to reverse that especially just special-casing it for the Windows CI run. I'd suggest we instead switch out HalfCheetah for something like MountainCarContinuous that does not require extra dependencies -- tracking in #798

Note the gym.register line here should succeed even if MuJoCo is not present -- it will only fail when attempting to instantiate those environments. Indeed,

This doesn't error on unit-test-linux since we're skipping all notebooks: https://github.com/HumanCompatibleAI/imitation/blob/master/tests/test_examples.py#L43 This was because they were meant to be run in readthedocs after #565 This run is happening but the notebooks failing is just a warning:

/home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/docs/tutorials/8a_train_sqil_sac.ipynb: WARNING: Executing notebook failed: CellExecutionError [mystnb.exec] /home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/docs/tutorials/8a_train_sqil_sac.ipynb: WARNING: Notebook exception traceback saved in: /home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/_readthedocs/html/reports/tutorials/8a_train_sqil_sac.err.log [mystnb.exec]

I've opened #799 to track this

Mac we are not running test_examples.py either due to a separate bug, I'll address in #797

AdamGleave · 2023-10-04T00:23:42Z

docs/tutorials/10_train_custom_env.ipynb

@@ -5,18 +5,17 @@
   "metadata": {},


Newlines seem to have changed throughout this and many other notebooks making the diff messy. Can you clean this up? Most of the notebooks shouldn't be changing in this PR, and I can't tell right now which notebooks had actual semantic changes.

Sorry about this. Yes, I did some semantic changes and then the lint keeps failing on docs/tutorials/4_train_airl.ipynb. I did something wrong and formatted all notebook. I should have just formatted that only one.

I have reverted the commit and now the notebook change should be easy to read.

AdamGleave

On a closer look I see you made some non-trivial changes to the notebooks (not just newlines) and the newlines seem to have been introduced by black reformatting (perhaps a newer version is more opinionated?) so I won't insist on that change.

I'm removing the MuJoCo dependency as that shouldn't be in our CI. This will make it red again but I'll go ahead and merge if the only test that's failing is for 8a_train_sqil_sac; we should fix that separately in #800 by changing the notebook

This reverts commit 99cd7fd.

This reverts commit d5026b6.

This reverts commit b199b98.

AdamGleave

LGTM

ernestum added 2 commits September 8, 2023 13:29

Add a test timeout.

41d4d17

Add a test timeout.

243db63

Skip ray tests on windows.

ce1f8c6

ernestum force-pushed the add_test_timeout branch from db464fe to ce1f8c6 Compare September 8, 2023 17:01

ernestum mentioned this pull request Sep 18, 2023

Gymnasium Compatibility #735

Merged

3 tasks

Remove path limit in Windows environment

86f877d

From unit-test-windows -> dependencies-windows

1a3323f

Merge remote-tracking branch 'origin/master' into add_test_timeout

afdba28

ZiyueWang25 mentioned this pull request Oct 1, 2023

tests/algorithms/test_sqil.py::test_sqil_performance_continuous[DDPG] failure #791

Closed

ZiyueWang25 added 5 commits October 1, 2023 20:16

Update tutorial notebooks

7eb9dbc

Merge branch 'notebook_fix' into add_test_timeout

a83df13

Revert "Merge branch 'notebook_fix' into add_test_timeout"

1e3370d

This reverts commit a83df13, reversing changes made to afdba28.

Update tutorial notebooks

08bd76a

Remove output from 3_train_gail.ipynb

423ff85

ZiyueWang25 mentioned this pull request Oct 2, 2023

Notebook fix #792

Closed

ZiyueWang25 added 4 commits October 1, 2023 20:35

Merge branch 'notebook_fix' into add_test_timeout

3f19d95

Adjust rl_kwargs to pass sqil performance test

26a730e

customize rl parameters for different algo in test

1d927af

Increase timeout to 5 min and shorten a NB runtime

224ed28

Resolve np randint high bound issue

cddc094

Increase the notebook timeout from 5 mins to 9

f88dc0f

ZiyueWang25 requested a review from AdamGleave October 2, 2023 18:21

AdamGleave reviewed Oct 3, 2023

View reviewed changes

add github issue number for tracking

9aa5c47

ZiyueWang25 requested a review from AdamGleave October 3, 2023 03:03

ZiyueWang25 added 4 commits October 2, 2023 20:57

rename env to venv, add seed

96cf438

change name to "enable long path" for clarity

e91513f

reformat notebook

b199b98

reformat notebook use black-jupyter

d5026b6

AdamGleave requested changes Oct 4, 2023

View reviewed changes

Remove MuJoCo install

fdc1c03

AdamGleave approved these changes Oct 4, 2023

View reviewed changes

AdamGleave and others added 6 commits October 3, 2023 18:02

Invalidate MacOS cache

99cd7fd

Revert "Invalidate MacOS cache"

74fa89a

This reverts commit 99cd7fd.

Invalidate Windows dependencies

776c535

Revert "reformat notebook use black-jupyter"

3e70bb0

This reverts commit d5026b6.

Revert "reformat notebook"

3a2a8e2

This reverts commit b199b98.

reformat 4_train_airl notebook only

c2a5d13

ernestum mentioned this pull request Oct 4, 2023

Fix MacOS Pipeline: Include tests not in subdirectories #797

Merged

AdamGleave approved these changes Oct 5, 2023

View reviewed changes

AdamGleave merged commit 4e429ce into master Oct 5, 2023
1 of 2 checks passed

AdamGleave deleted the add_test_timeout branch October 5, 2023 01:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a test timeout #779

Add a test timeout #779

ernestum commented Sep 8, 2023 •

edited

Loading

AdamGleave commented Sep 8, 2023

ernestum commented Sep 8, 2023

AdamGleave commented Sep 11, 2023

ernestum commented Sep 13, 2023 •

edited

Loading

ZiyueWang25 commented Oct 1, 2023

ZiyueWang25 commented Oct 1, 2023

ZiyueWang25 commented Oct 1, 2023

ZiyueWang25 commented Oct 2, 2023

ZiyueWang25 commented Oct 2, 2023

ZiyueWang25 commented Oct 2, 2023

ZiyueWang25 commented Oct 2, 2023

codecov bot commented Oct 2, 2023 •

edited

Loading

AdamGleave left a comment

AdamGleave Oct 3, 2023

ZiyueWang25 Oct 3, 2023

ZiyueWang25 Oct 3, 2023

AdamGleave Oct 4, 2023

ZiyueWang25 Oct 4, 2023

AdamGleave left a comment

AdamGleave Oct 4, 2023

AdamGleave Oct 4, 2023

ZiyueWang25 Oct 4, 2023

AdamGleave left a comment

AdamGleave left a comment

Add a test timeout #779

Add a test timeout #779

Conversation

ernestum commented Sep 8, 2023 • edited Loading

Description

Testing

AdamGleave commented Sep 8, 2023

ernestum commented Sep 8, 2023

AdamGleave commented Sep 11, 2023

ernestum commented Sep 13, 2023 • edited Loading

ZiyueWang25 commented Oct 1, 2023

ZiyueWang25 commented Oct 1, 2023

ZiyueWang25 commented Oct 1, 2023

ZiyueWang25 commented Oct 2, 2023

ZiyueWang25 commented Oct 2, 2023

ZiyueWang25 commented Oct 2, 2023

ZiyueWang25 commented Oct 2, 2023

codecov bot commented Oct 2, 2023 • edited Loading

Codecov Report

AdamGleave left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment

ernestum commented Sep 8, 2023 •

edited

Loading

ernestum commented Sep 13, 2023 •

edited

Loading

codecov bot commented Oct 2, 2023 •

edited

Loading