Train speech language ID classification head #450

am831 · 2024-05-09T19:55:29Z

Implements a new training script for training a classification head. The new nn.module in model.py is separate from other components of M4T and takes speech encoder output and maps it into language probabilities with softmax. All layers of M4T are frozen to prevent training of those components. A new dataset.py script is implemented for downloading common voice from hugging face.

Most recent results from training:

Samples in train dataset:

- eng - 647 samples
- fra - 676 samples
- deu - 862 samples
- rus - 775 samples
- spa - 908 samples
- hin - 418 samples
- Total - 4286 samples

Samples in eval dataset:

- eng - 394 samples
- fra - 289 samples
- deu - 363 samples
- rus - 356 samples
- spa - 408 samples
- hin - 239 samples
- Total - 2049 samples

Parameters:

- batch_size 32
- learning_rate .001
- patience 10
- num_layers 4

I am collaborating with @zrthxn on this project

src/seamless_communication/cli/m4t/classification_head/model.py

zrthxn · 2024-05-10T18:34:16Z

src/seamless_communication/cli/m4t/classification_head/train.py

+
+    grad_scaler = torch.cuda.amp.GradScaler()
+    optimizer = AdamW(
+        params=frozen_model.parameters(),


We want to optimize the head not the frozen model

* Dataloader * Training loop fixes * One Hot encode language labels * Remove `dist_utils` * Remove duplicate option * remove label smoothing * Device class * Batching * Add float_dtype * Add padding mask * Add padding mask * Device * Model shape * Optimize head

* Small fixes * Argyment * Batch size * Model shape * unsqueeze * Long * Float tensor * Float tensor * DEvice * While set * src_lengths * Refactor

…nication into language_id

* Dataloader * Training loop fixes * One Hot encode language labels * Remove `dist_utils` * Remove duplicate option * remove label smoothing * Device class * Batching * Add float_dtype * Add padding mask * Add padding mask * Device * Model shape * Optimize head

* Small fixes * Argyment * Batch size * Model shape * unsqueeze * Long * Float tensor * Float tensor * DEvice * While set * src_lengths * Refactor

Ruslan's Fixes

…nication into language_id

am831 and others added 14 commits May 9, 2024 19:48

classification head class

738c135

finetune script progress

feac449

add comment

dc6cbdb

add layers

350f6e2

Model freeze und save classification head

3c81c28

Implement train loop

74f1f2d

Implement train loop

3280b64

calc loss, class head params

ca42666

Refactor

13901cb

fix errors

db3df0c

get vector dimensions within classification head

1b84a5b

hidden_dim

c343cd4

log and capture interrupts

bc7a37f

dataset prep and plotting loss

d2a65f8

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 9, 2024

zrthxn reviewed May 10, 2024

View reviewed changes

src/seamless_communication/cli/m4t/classification_head/model.py Outdated Show resolved Hide resolved

Merge branch 'facebookresearch:main' into language_id

4c65310

zrthxn reviewed May 10, 2024

View reviewed changes

zrthxn force-pushed the language_id branch from 4c65310 to d2a65f8 Compare May 10, 2024 18:35

zrthxn and others added 11 commits May 11, 2024 00:07

Model Fixes (#3)

ce04f48

* Small fixes * Argyment * Batch size * Model shape * unsqueeze * Long * Float tensor * Float tensor * DEvice * While set * src_lengths * Refactor

Merge branch 'facebookresearch:main' into language_id

6584115

Merge branch 'language_id' of https://github.com/am831/seamless_commu…

2567ebe

…nication into language_id

classification head class

5efb06c

finetune script progress

d0e8efc

add comment

6d76952

add layers

64ca2be

Model freeze und save classification head

373312c

Implement train loop

9ee465f

Implement train loop

e1ab896

am831 and others added 24 commits May 22, 2024 15:45

finetune script progress

d1ab8ad

add comment

631305f

add layers

6b81e8e

Model freeze und save classification head

b266e77

Implement train loop

2f4b117

Implement train loop

2ce0eca

calc loss, class head params

228bf5f

Refactor

b33a1d1

fix errors

2eb1732

get vector dimensions within classification head

cbaef16

hidden_dim

b5358de

log and capture interrupts

cc85576

dataset prep and plotting loss

2271edc

Model Fixes (#3)

c2488e9

* Small fixes * Argyment * Batch size * Model shape * unsqueeze * Long * Float tensor * Float tensor * DEvice * While set * src_lengths * Refactor

get embed_dim dynamically (#4)

9eb5876

Model Fixes

7244db5

address some feedback

d2810d8

save plot as pkl

622675c

switch model to train mode

8980655

Code cleanup

13bcdea

BCE loss

f86a201

Remove Label smoothing

e0bffc1

change model to increase train loss

e269cef

zrthxn force-pushed the language_id branch from 588b0a8 to e269cef Compare May 22, 2024 13:48

mavlyutovr and others added 3 commits May 22, 2024 15:54

lid classification head training script

27c8c5d

fixes

671746e

Merge pull request #8 from zrthxn/lid/ruslan-fixes

71d248c

Ruslan's Fixes

am831 marked this pull request as ready for review June 1, 2024 18:51

Merge branch 'language_id' of https://github.com/am831/seamless_commu…

6485277

…nication into language_id

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train speech language ID classification head #450

Train speech language ID classification head #450

am831 commented May 9, 2024 •

edited

Loading

zrthxn May 10, 2024

Train speech language ID classification head #450

Are you sure you want to change the base?

Train speech language ID classification head #450

Conversation

am831 commented May 9, 2024 • edited Loading

zrthxn May 10, 2024

Choose a reason for hiding this comment

am831 commented May 9, 2024 •

edited

Loading