Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imagenet_256_cc.yml runtime error #9

Open
mateibejan1 opened this issue Jul 25, 2022 · 2 comments
Open

imagenet_256_cc.yml runtime error #9

mateibejan1 opened this issue Jul 25, 2022 · 2 comments

Comments

@mateibejan1
Copy link

mateibejan1 commented Jul 25, 2022

I'm trying to test the 256 ImageNet model on the deblurring task on the OOD data you provide in your adiacent repository. I'm getting this error:

ERROR - main.py - 2022-07-25 10:25:13,026 - Traceback (most recent call last):
  File "/Users/mbejan/Documents/diffusion/ddrm/main.py", line 164, in main
    runner.sample()
  File "/Users/mbejan/Documents/diffusion/ddrm/runners/diffusion.py", line 161, in sample
    self.sample_sequence(model, cls_fn)
  File "/Users/mbejan/Documents/diffusion/ddrm/runners/diffusion.py", line 249, in sample_sequence
    for x_orig, classes in pbar:
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 438, in __iter__
    return self._get_iterator()
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 384, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1048, in __init__
    w.start()
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'Diffusion.sample_sequence.<locals>.seed_worker'

This is the script that creates the behaviour from above:

python main.py --ni \
  --config imagenet_256_cc.yml \
  --doc ood \
  --timesteps 20 \
  --eta 0.85 \
  --etaB 1 \
  --deg deblur_uni \
  --sigma_0 0.05 \

My imagenet_256_cc.yml is the same as the one your provide apart from the out_of _distribution argument, which is set to true.

@lshaw8317
Copy link

lshaw8317 commented Feb 28, 2023

#18 is related. I also had the same error. Adding global seed_worker to Diffusion.sample_sequence in diffusion.py fails to resolve issue:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\shaw\Anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\shaw\Anaconda3\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'seed_worker' on <module 'runners.diffusion' from 'C:\\Users\\shaw\\Documents\\Year 2\\Diffusion Models\\ddrm\\runners\\diffusion.py'>

The reason (in my case) is that when running on Windows the multiprocessing module uses spawn and so one must (according to docs):

Wrap most of you main script’s code within if name == 'main': block, to make sure it doesn’t run again (most likely generating error) when each worker process is launched. You can place your dataset and DataLoader instance creation logic here, as it doesn’t need to be re-executed in workers.

Make sure that any custom collate_fn, worker_init_fn or dataset code is declared as top level definitions, outside of the main check. This ensures that they are available in worker processes. (this is needed since functions are pickled as references only, not bytecode.)

It is difficult to implement this advice since the seed_worker function needs access to the input args coming from the config file.
Simplest "solution" was to just set the worker_init_fn argument to None as below (within Diffusion.sample_sequence):

val_loader = data.DataLoader(
            test_dataset,
            batch_size=config.sampling.batch_size,
            shuffle=True,
            num_workers=config.data.num_workers,
            worker_init_fn=None,
            generator=g,
        )

@LinWeiJeff
Copy link

LinWeiJeff commented Jun 17, 2024

#18 is related. I also had the same error. Adding global seed_worker to Diffusion.sample_sequence in diffusion.py fails to resolve issue:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\shaw\Anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\shaw\Anaconda3\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'seed_worker' on <module 'runners.diffusion' from 'C:\\Users\\shaw\\Documents\\Year 2\\Diffusion Models\\ddrm\\runners\\diffusion.py'>

The reason (in my case) is that when running on Windows the multiprocessing module uses spawn and so one must (according to docs):

Wrap most of you main script’s code within if name == 'main': block, to make sure it doesn’t run again (most likely generating error) when each worker process is launched. You can place your dataset and DataLoader instance creation logic here, as it doesn’t need to be re-executed in workers.
Make sure that any custom collate_fn, worker_init_fn or dataset code is declared as top level definitions, outside of the main check. This ensures that they are available in worker processes. (this is needed since functions are pickled as references only, not bytecode.)

It is difficult to implement this advice since the seed_worker function needs access to the input args coming from the config file. Simplest "solution" was to just set the worker_init_fn argument to None as below (within Diffusion.sample_sequence):

val_loader = data.DataLoader(
            test_dataset,
            batch_size=config.sampling.batch_size,
            shuffle=True,
            num_workers=config.data.num_workers,
            worker_init_fn=None,
            generator=g,
        )

@lshaw8317 Hello, I have the same problem that occurs the error:

ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'Diffusion.sample_sequence.<locals>.seed_worker'

,and I tried to use your solution that I just set the worker_init_fn argument to None. However, after setting and implementing the code, the tqdm bar (indicating sampling progress) freezes at 0% for a while (about 20 seconds), and eventually it occurs a new error shown as below picture:
螢幕擷取畫面 2024-06-17 155512

I don't know why it occurs "MemoryError".
Did you encounter this new error? Did you know how to solve it?
If you need more information about how I implemented the code, I am very willing to provide
Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants