Fix embedding use of tokenizer to avoid already borrowed #369

markstur · 2024-07-22T05:19:13Z

With high concurrency the fast tokenizer (Rust) hits "Already borrowed" exceptions with the truncation and padding (and max length) params change. Do not change those. Detect where to truncate (or return message) using the default settings and some code. Truncate texts and re-tokenize when needed.

In addition, the error message which used to report number of tokens over the limit (but did not say which sentence) now returns the index(es) of the sentence(s) that was/were found to be too long.

With high concurrency the fast tokenizer (Rust) hits "Already borrowed" exceptions with the truncation and padding (and max length) params change. Do not change those. Detect where to truncate (or return message) using the default settings and some code. Truncate texts and re-tokenize when needed. In addition, the error message which used to report number of tokens over the limit (but did not say which sentence) now returns the index(es) of the sentence(s) that was/were found to be too long. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Tokenizer is not completely thread-safe. This change copies the tokenizer for each thread (per model) to avoid future problems related to this. Eats up some memory, but small compared to models. Note: For embeddings server, we've been recommending 5 threads because more is slower anyway. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

gabe-l-hart · 2024-07-24T12:29:02Z

caikit_nlp/modules/text_embedding/embedding.py

-        return self.tokenizer(
+
+        # Keep copies of tokenizer per thread (in each wrapped model instance)
+        thread_id = threading.get_ident()


One other primitive to take a look at here is threading.Local

I tried using threading.Local instead of a map, but realized that each thread would need a map to get the correct tokenizer per model. So I preferred map keyed by thread in the model instance vs map keyed by model name (id?) in thread local. So I gave up experimenting with threading.Local, but was hoping it had some advantage here that I did not get to. Is a map in thread local better somehow for this?

Ah, makes sense!

evaline-ju

LGTM

markstur requested review from alex-jw-brooks, gkumbhat, evaline-ju, gabe-l-hart, tharapalanivel and Ssukriti as code owners July 22, 2024 05:19

markstur force-pushed the tokenizer_already_borrowed branch from f07a634 to 1c09a02 Compare July 22, 2024 05:28

gabe-l-hart reviewed Jul 24, 2024

View reviewed changes

evaline-ju approved these changes Jul 24, 2024

View reviewed changes

evaline-ju merged commit 44d61a5 into caikit:main Jul 24, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix embedding use of tokenizer to avoid already borrowed #369

Fix embedding use of tokenizer to avoid already borrowed #369

markstur commented Jul 22, 2024

gabe-l-hart Jul 24, 2024

markstur Jul 24, 2024

gabe-l-hart Jul 24, 2024

evaline-ju left a comment

Fix embedding use of tokenizer to avoid already borrowed #369

Fix embedding use of tokenizer to avoid already borrowed #369

Conversation

markstur commented Jul 22, 2024

gabe-l-hart Jul 24, 2024

Choose a reason for hiding this comment

markstur Jul 24, 2024

Choose a reason for hiding this comment

gabe-l-hart Jul 24, 2024

Choose a reason for hiding this comment

evaline-ju left a comment

Choose a reason for hiding this comment