Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about the training code #86

Open
Tonsty opened this issue Dec 2, 2024 · 0 comments
Open

Some questions about the training code #86

Tonsty opened this issue Dec 2, 2024 · 0 comments

Comments

@Tonsty
Copy link

Tonsty commented Dec 2, 2024

**Question 1 - the padding may be assigned in incorrect order **

During ContextCrop process, the padding is assigned as follows

paddings = [
max(-left + min(0, right), 0),
max(bottom - max(h, top), 0),
max(right - max(w, left), 0),
max(-top + min(0, bottom), 0),
]

which seem to be in the order of [left, bottom, right, top], not match the following code

results["paddings"] = paddings # left ,top ,right, bottom

but I do not determine whether it matters because I found the transform operations are performed as follows, which is correct

shapes = dict(height=height, width=width, top=top, left=left)
self._transform_img(results, shapes)
if not self.keep_original:
self._transform_gt(results, shapes)
self._transform_masks(results, shapes)
else:

(It indeed matters in the validation process which uses "paddings" , where "paddings" assume in the order of left, right, top. bottom)

depth_gt = inputs["depth"]
image_paddings = [image_metas[0]["paddings"]]
depth_paddings = [image_metas[0]["depth_paddings"]]
predictions = match_gt(
predictions, depth_gt, padding1=image_paddings, padding2=depth_paddings
)
pred_angles = match_gt(
pred_angles, depth_gt, padding1=image_paddings, padding2=depth_paddings
)

pad1_l, pad1_r, pad1_t, pad1_b = (
padding1[i] if padding1 is not None else (0, 0, 0, 0)
)
item1_unpadded = item1[:, pad1_t : h1 - pad1_b, pad1_l : w1 - pad1_r]

**Question 2 - full contained condition may not be satisfied **

In the following code, the second branch condition "output_ratio / input_ratio * ctx > 1" will never meet because output_ratio / input_ratio <= 1.0 and ctx < 1.0. And in the "fully contained" branch, "full contained" condition may either not be satisfied because
new_h = new_w / output_ratio
= w * (ctx * output_ratio / input_ratio) ** 0.5 / output_ratio
= h * input_radio * (ctx * output_ratio / input_ratio) ** 0.5 / output_ratio
= h * (ctx * input_radio / out_radio) ** 0.5

although ctx < 1.0, input_radio / out_radio is larger than 1.0, so that new_h may be larger than h, then the crop cannot be fully contained.

if output_ratio <= input_ratio: # out like 4:3 in like kitti
if (
ctx >= 1
): # fully in -> use just max_length with sqrt(ctx), here max is width
new_w = w * ctx**0.5
# sporge un po in una sola dim
# we know that in_width will stick out before in_height, partial overshoot (sporge)
# new_h > old_h via area -> new_h ** 2 * ratio_new = old_h ** 2 * ratio_old * ctx
elif output_ratio / input_ratio * ctx > 1:
new_w = w * ctx
else: # fully contained -> use area
new_w = w * (ctx * output_ratio / input_ratio) ** 0.5
new_h = new_w / output_ratio

Question 3 - Only half of the batched data are used for training?

nsteps_accumulation_gradient is set to 1 in the training config, and batch_chunk is equal to batch_size, but the batched data obtained from dataloader (batches["data"]) in fact have a double batch size of 2 * batch_size since a pair of data items are loaded in ConcatDataset.__get_item__ (self.pairs is configured to 2) . Thus, only the first half data in batches["data"] are used for training.

batch_chunk = batch_size // nsteps_accumulation_gradient

UniDepth/scripts/train.py

Lines 423 to 430 in 5afc0dc

for idx in range(nsteps_accumulation_gradient):
batch = {}
batch_slice = slice(idx * batch_chunk, (idx + 1) * batch_chunk)
batch["data"] = {k: v[batch_slice] for k, v in batches["data"].items()}
batch["img_metas"] = batches["img_metas"][batch_slice]
# remove temporal dimension of the dataloder, here is always 1!
batch["data"] = {k: v.squeeze(1) for k, v in batch["data"].items()}

def __getitem__(self, idxs):
self.sample_shape()
return [
super(ConcatDataset, self).__getitem__(idx)
for idx in idxs
for _ in range(self.pairs)
]

**Question 4 - SequenceDataset is not suitable for training because of the calculation of SelfDistill loss **

The SelfDistill loss is under the assumption that each pair of image from the chunks is at the same view, which in fact originates from the same image but with different crop/resize operations. However, given certain idx, super(ConcatDataset, self).__getitem__(idx) may load two different images from the underlying image sequence (start is a random value between 0 and num_samples_sequence).

chunks = input.shape[0] // 2

def __getitem__(self, idxs):
self.sample_shape()
return [
super(ConcatDataset, self).__getitem__(idx)
for idx in idxs
for _ in range(self.pairs)
]

start = np.random.randint(0, max(1, num_samples_sequence - self.num_frames))
idxs = list(
range(start, min(num_samples_sequence, self.num_frames + start))
)
keyframe_idx = np.random.randint(0, len(idxs))
else:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant