Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purpose of "smart_batching_collate" #2956

Closed
ShengYun-Peng opened this issue Sep 24, 2024 · 2 comments
Closed

Purpose of "smart_batching_collate" #2956

ShengYun-Peng opened this issue Sep 24, 2024 · 2 comments

Comments

@ShengYun-Peng
Copy link

What is "smart_batching_collate" mainly used for and how is it different from the default collate function that only takes the tokenizer and pad the sentences in a batch?

@tomaarsen
Copy link
Collaborator

Hello!

Good question. smart_batching_collate is implemented in fit_mixin.py, a file that exists to inject some inject some old methods into the SentenceTransformer class. To clarify, training SentenceTransformer models used to be done using a fit method. In the v3 refactor we moved to primarily using a SentenceTransformerTrainer instead, but I didn't want to break backwards compatibility, so I updated the fit method to use the new Trainer.

But, because I thought there might still be some edge cases where people really want to use the original fit method, I also kept the original fit method, now named old_fit. This old_fit method will be removed in a future version, and fit will also be deprecated at one point, then we'll fully move to only the Trainer.

The smart_batching_collate method is a leftover from the original fit method, now implemented in old_fit. To be specific, it's used here:

# Use smart batching
for dataloader in dataloaders:
dataloader.collate_fn = self.smart_batching_collate

So, this collator is used in the DataLoader instances. It is used to convert the InputExample instances that the DataLoader would return into torch tensors. The DataLoader (from torch) has no idea what to do with InputExample instances otherwise, so this collator was required.
Why it was called smart? No idea. It doesn't seem particularly smart to me, haha, it seems pretty standard.

In short: it exists so that the old training method (now under old_fit) should still work. It'll be removed in the future, perhaps with the v4 release or some point sooner.

Hope that clears it up a bit.

  • Tom Aarsen

@ShengYun-Peng
Copy link
Author

Thanks for the clear explanation! Haha, it is good to learn the development history. I'll close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants