AssertionError: missing mask files in Anomalib when running anomaly segmentation task even though they are present in directory #1024

1amrutesh · 2023-04-24T03:40:21Z

1amrutesh
Apr 24, 2023

Description

I'm using the anomalib library for anomaly detection in custom dataset, and I'm running into an issue where the library is unable to find the mask files even though they are present in the directory structure. I'm getting an AssertionError with the following message: "missing mask files, mask_dir=/workspace/mydata/mydata/ground_truth".

I have checked that the mask files are correctly labeled and present in the ground_truth folder. The error occurs in the anomalib/data/folder.py file. Here's the relevant part of the YAML file:
dataset:

  name: mvtec
  format: folder
  path: mydata/mydata
  normal_dir: train/good
  abnormal_dir: test
  normal_test_dir: test/good
  mask: ground_truth/broken
  extensions: ['.png','.PNG']
  task: segmentation
  train_batch_size: 32
  eval_batch_size: 32
  num_workers: 8
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: null # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  create_validation_set: true
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: padim
  backbone: resnet18
  pre_trained: true
  layers:
    - layer1
    - layer2
    - layer3
  normalization_method: min_max # options: [none, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 42
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: null # options: torch, onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 1
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1
  limit_val_batches: 1
  limit_test_batches: 1
  limit_predict_batches: 1
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: gpu # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

The directory structure of mydata is as follows

mydata
├── ground_truth
│   ├── anomaly1
│   ├── anomaly2
│   └── anomaly3
├── test
│   ├──good
│   ├── anomaly1
│   ├── anomaly2
│   └── anomaly3
└── train
    └── good

I have also checked the naming convention of the images in the test set and the images in the ground_truth folder and they according to the make_folder_dataset function in folder.py script. The make_folder_dataset function in folder.py expects the mask files to have the same filename as the corresponding image files, but with the _mask suffix added to the end. For example, if the image file is img001.png, then the corresponding mask file should be named img001_mask.png. For example the name of the images in the anoamly1 directory for the groud_truth is

and the name of the corresponding images in the anomaly1 in the test directory is

Steps taken:

I have checked that the mask files are correctly labeled and present in the ground_truth folder. I have also tried changing the path and using both relative and absolute paths, but the error persists.

Dataset

Other

Model

PaDim
I am using the latest anomalib library version. I am using the command !pip install git+https://github.com/openvinotoolkit/anomalib.git. My python version is 3.10.
The complete traceback is as follows

AssertionError                            Traceback (most recent call last)
Cell In[83], line 1
----> 1 trainer.fit(model=model, datamodule=datamodule)

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:608, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    606 model = self._maybe_unwrap_optimized(model)
    607 self.strategy._lightning_module = model
--> 608 call._call_and_handle_interrupt(
    609     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    610 )

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:38, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     36         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
     37     else:
---> 38         return trainer_fn(*args, **kwargs)
     40 except _TunerExitException:
     41     trainer._call_teardown_hook()

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:650, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    643 ckpt_path = ckpt_path or self.resume_from_checkpoint
    644 self._ckpt_path = self._checkpoint_connector._set_ckpt_path(
    645     self.state.fn,
    646     ckpt_path,  # type: ignore[arg-type]
    647     model_provided=True,
    648     model_connected=self.lightning_module is not None,
    649 )
--> 650 self._run(model, ckpt_path=self.ckpt_path)
    652 assert self.state.stopped
    653 self.training = False

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1051, in Trainer._run(self, model, ckpt_path)
   1048 self.strategy.setup_environment()
   1049 self.__setup_profiler()
-> 1051 self._call_setup_hook()  # allow user to setup lightning_module in accelerator environment
   1053 # check if we should delay restoring checkpoint till later
   1054 if not self.strategy.restore_checkpoint_after_setup:

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1298, in Trainer._call_setup_hook(self)
   1295 self.strategy.barrier("pre_setup")
   1297 if self.datamodule is not None:
-> 1298     self._call_lightning_datamodule_hook("setup", stage=fn)
   1299 self._call_callback_hooks("setup", stage=fn)
   1300 self._call_lightning_module_hook("setup", stage=fn)

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1375, in Trainer._call_lightning_datamodule_hook(self, hook_name, *args, **kwargs)
   1373 if callable(fn):
   1374     with self.profiler.profile(f"[LightningDataModule]{self.datamodule.__class__.__name__}.{hook_name}"):
-> 1375         return fn(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/anomalib/data/base/datamodule.py:102, in AnomalibDataModule.setup(self, stage)
     96 """Setup train, validation and test data.
     97 
     98 Args:
     99   stage: str | None:  Train/Val/Test stages. (Default value = None)
    100 """
    101 if not self.is_setup:
--> 102     self._setup(stage)
    103 assert self.is_setup

File /opt/conda/lib/python3.10/site-packages/anomalib/data/base/datamodule.py:118, in AnomalibDataModule._setup(self, _stage)
    115 assert self.train_data is not None
    116 assert self.test_data is not None
--> 118 self.train_data.setup()
    119 self.test_data.setup()
    121 self._create_test_split()

File /opt/conda/lib/python3.10/site-packages/anomalib/data/base/dataset.py:161, in AnomalibDataset.setup(self)
    159 """Load data/metadata into memory."""
    160 if not self.is_setup:
--> 161     self._setup()
    162 assert self.is_setup, "setup() should set self._samples"

File /opt/conda/lib/python3.10/site-packages/anomalib/data/folder.py:233, in FolderDataset._setup(self)
    231 def _setup(self) -> None:
    232     """Assign samples."""
--> 233     self.samples = make_folder_dataset(
    234         root=self.root,
    235         normal_dir=self.normal_dir,
    236         abnormal_dir=self.abnormal_dir,
    237         normal_test_dir=self.normal_test_dir,
    238         mask_dir=self.mask_dir,
    239         split=self.split,
    240         extensions=self.extensions,
    241     )

File /opt/conda/lib/python3.10/site-packages/anomalib/data/folder.py:162, in make_folder_dataset(normal_dir, root, abnormal_dir, normal_test_dir, mask_dir, split, extensions)
    158             samples.loc[index, "mask_path"] = str(mask_dir / rel_image_path)
    160     # make sure all the files exist
    161     # samples.image_path does NOT need to be checked because we build the df based on that
--> 162     assert samples.mask_path.apply(
    163         lambda x: Path(x).exists() if x != "" else True
    164     ).all(), f"missing mask files, mask_dir={mask_dir}"
    166 # Ensure the pathlib objects are converted to str.
    167 # This is because torch dataloader doesn't like pathlib.
    168 samples = samples.astype({"image_path": "str"})

AssertionError: missing mask files, mask_dir=/workspace/mydata/mydata/ground_truth/anomaly1

What can be the potential root cause of the error? Does the folder.py script support the custom dataset where the number of anomalies is more than one?
I shall be grateful for the help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError: missing mask files in Anomalib when running anomaly segmentation task even though they are present in directory #1024

{{title}}

Replies: 0 comments

Select a reply

AssertionError: missing mask files in Anomalib when running anomaly segmentation task even though they are present in directory #1024

1amrutesh Apr 24, 2023

Description

Steps taken:

Dataset

Model

Replies: 0 comments

1amrutesh
Apr 24, 2023