Some questions about the training code #86

Tonsty · 2024-12-02T06:37:11Z

Question 1 - the padding may be assigned in incorrect order

During ContextCrop process, the padding is assigned as follows

UniDepth/unidepth/datasets/pipelines/transforms.py

Lines 1275 to 1280 in 5afc0dc

    
           paddings = [ 
        
               max(-left + min(0, right), 0), 
        
               max(bottom - max(h, top), 0), 
        
               max(right - max(w, left), 0), 
        
               max(-top + min(0, bottom), 0), 
        
           ]

which seem to be in the order of [left, bottom, right, top], not match the following code

UniDepth/unidepth/datasets/pipelines/transforms.py

Line 1307 in 5afc0dc

results["paddings"] = paddings # left ,top ,right, bottom

but I do not determine whether it matters because I found the transform operations are performed as follows, which is correct

UniDepth/unidepth/datasets/pipelines/transforms.py

Lines 1315 to 1320 in 5afc0dc

    
           shapes = dict(height=height, width=width, top=top, left=left) 
        
           self._transform_img(results, shapes) 
        
           if not self.keep_original: 
        
               self._transform_gt(results, shapes) 
        
               self._transform_masks(results, shapes) 
        
           else:

(It indeed matters in the validation process which uses "paddings" , where "paddings" assume in the order of left, right, top. bottom)

UniDepth/unidepth/models/unidepthv1/unidepthv1.py

Lines 157 to 165 in 5afc0dc

    
           depth_gt = inputs["depth"] 
        
           image_paddings = [image_metas[0]["paddings"]] 
        
           depth_paddings = [image_metas[0]["depth_paddings"]] 
        
           predictions = match_gt( 
        
               predictions, depth_gt, padding1=image_paddings, padding2=depth_paddings 
        
           ) 
        
           pred_angles = match_gt( 
        
               pred_angles, depth_gt, padding1=image_paddings, padding2=depth_paddings 
        
           )

UniDepth/unidepth/utils/misc.py

Lines 606 to 610 in 5afc0dc

    
           pad1_l, pad1_r, pad1_t, pad1_b = ( 
        
               padding1[i] if padding1 is not None else (0, 0, 0, 0) 
        
           ) 
        
           item1_unpadded = item1[:, pad1_t : h1 - pad1_b, pad1_l : w1 - pad1_r]

Question 2 - full contained condition may not be satisfied

In the following code, the second branch condition "output_ratio / input_ratio * ctx > 1" will never meet because output_ratio / input_ratio <= 1.0 and ctx < 1.0. And in the "fully contained" branch, "full contained" condition may either not be satisfied because
new_h = new_w / output_ratio
= w * (ctx * output_ratio / input_ratio) ** 0.5 / output_ratio
= h * input_radio * (ctx * output_ratio / input_ratio) ** 0.5 / output_ratio
= h * (ctx * input_radio / out_radio) ** 0.5
although ctx < 1.0, input_radio / out_radio is larger than 1.0, so that new_h may be larger than h, then the crop cannot be fully contained.

UniDepth/unidepth/datasets/pipelines/transforms.py

Lines 1213 to 1225 in 5afc0dc

    
           if output_ratio <= input_ratio:  # out like 4:3 in like kitti 
        
               if ( 
        
                   ctx >= 1 
        
               ):  # fully in -> use just max_length with sqrt(ctx), here max is width 
        
                   new_w = w * ctx**0.5 
        
               # sporge un po in una sola dim 
        
               # we know that in_width will stick out before in_height, partial overshoot (sporge) 
        
               # new_h > old_h via area -> new_h ** 2 * ratio_new = old_h ** 2 * ratio_old * ctx 
        
               elif output_ratio / input_ratio * ctx > 1: 
        
                   new_w = w * ctx 
        
               else:  # fully contained -> use area 
        
                   new_w = w * (ctx * output_ratio / input_ratio) ** 0.5 
        
               new_h = new_w / output_ratio

Question 3 - Only half of the batched data are used for training?

nsteps_accumulation_gradient is set to 1 in the training config, and batch_chunk is equal to batch_size, but the batched data obtained from dataloader (batches["data"]) in fact have a double batch size of 2 * batch_size since a pair of data items are loaded in ConcatDataset.__get_item__ (self.pairs is configured to 2) . Thus, only the first half data in batches["data"] are used for training.

UniDepth/scripts/train.py

Line 258 in 5afc0dc

batch_chunk = batch_size // nsteps_accumulation_gradient

UniDepth/scripts/train.py

Lines 423 to 430 in 5afc0dc

    
           for idx in range(nsteps_accumulation_gradient): 
        
               batch = {} 
        
               batch_slice = slice(idx * batch_chunk, (idx + 1) * batch_chunk) 
        
               batch["data"] = {k: v[batch_slice] for k, v in batches["data"].items()} 
        
               batch["img_metas"] = batches["img_metas"][batch_slice] 
        
               # remove temporal dimension of the dataloder, here is always 1! 
        
               batch["data"] = {k: v.squeeze(1) for k, v in batch["data"].items()}

UniDepth/unidepth/datasets/utils.py

Lines 49 to 55 in 5afc0dc

    
           def __getitem__(self, idxs): 
        
               self.sample_shape() 
        
               return [ 
        
                   super(ConcatDataset, self).__getitem__(idx) 
        
                   for idx in idxs 
        
                   for _ in range(self.pairs) 
        
               ]

Question 4 - SequenceDataset is not suitable for training because of the calculation of SelfDistill loss

The SelfDistill loss is under the assumption that each pair of image from the chunks is at the same view, which in fact originates from the same image but with different crop/resize operations. However, given certain idx, super(ConcatDataset, self).__getitem__(idx) may load two different images from the underlying image sequence (start is a random value between 0 and num_samples_sequence).

UniDepth/unidepth/ops/losses/distill.py

Line 27 in 5afc0dc

chunks = input.shape[0] // 2

UniDepth/unidepth/datasets/utils.py

Lines 49 to 55 in 5afc0dc

    
           def __getitem__(self, idxs): 
        
               self.sample_shape() 
        
               return [ 
        
                   super(ConcatDataset, self).__getitem__(idx) 
        
                   for idx in idxs 
        
                   for _ in range(self.pairs) 
        
               ]

UniDepth/unidepth/datasets/sequence_dataset.py

Lines 219 to 224 in 5afc0dc

    
               start = np.random.randint(0, max(1, num_samples_sequence - self.num_frames)) 
        
               idxs = list( 
        
                   range(start, min(num_samples_sequence, self.num_frames + start)) 
        
               ) 
        
               keyframe_idx = np.random.randint(0, len(idxs)) 
        
           else:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about the training code #86

Some questions about the training code #86

Tonsty commented Dec 2, 2024 •

edited

Loading

Some questions about the training code #86

Some questions about the training code #86

Comments

Tonsty commented Dec 2, 2024 • edited Loading

**Question 1 - the padding may be assigned in incorrect order **

**Question 2 - full contained condition may not be satisfied **

Question 3 - Only half of the batched data are used for training?

**Question 4 - SequenceDataset is not suitable for training because of the calculation of SelfDistill loss **

Tonsty commented Dec 2, 2024 •

edited

Loading

Question 1 - the padding may be assigned in incorrect order

Question 2 - full contained condition may not be satisfied

Question 4 - SequenceDataset is not suitable for training because of the calculation of SelfDistill loss