Add tokenize & detokenize to client, fix typos #8

andreaskoepf · 2023-11-28T15:33:50Z

Added an implementation for the tasks tokenize & detokenize analogous to the existing other API requests

While the Task + Output structs are easily extensible it feels a bit cumbersome for simple requests like tokenize/detokenize. Maybe it would be worth to consider offering simpler functions like async fn detokenize(token_ids: &Vec<u32>) -> Result<String, Error>.

pacman82

Hello, thanks for the contribution. I have a few nitpicks, please toch them up, before merging.

Design feedback: The API also allows you to fetch a tokenizer and instantiate it locally. Might this fit your usecase even better?

src/detokenization.rs

pacman82 · 2023-11-30T07:05:44Z

src/detokenization.rs

+}
+
+#[derive(Serialize, Debug)]
+struct DetokenizationRequest<'a> {


Please rename this to DetokenizationBody.

OK, I renamed it to BodyDetokenization and ResponseDetokenization in the spirit of the existing BodyCompletion and ResponseCompletion.

pacman82 · 2023-11-30T07:08:31Z

src/lib.rs

@@ -215,6 +219,28 @@ impl Client {
            .output_of(&task.with_model(model), how)
            .await
    }
+
+    pub async fn tokenize(


A minimal example with a comment why you want to call this would be nice. Not required though for me to merge the PR

ok, I added docstring examples for tokenize()/detokenize().

src/tokenization.rs

andreaskoepf · 2023-11-30T11:21:46Z

Hello, thanks for the contribution. I have a few nitpicks, please toch them up, before merging.

Thanks a lot for your review! Hope I could address your comments. I will add docstrings for tokenize/detokenize.

Design feedback: The API also allows you to fetch a tokenizer and instantiate it locally. Might this fit your usecase even better?

That's great. Just saw that the python client supports this (aleph_alpha_client.py#L573C55-L573C79) by fetching /models/{model}/tokenizer. BTW This tokenizer-endpoint isn't mentioned in openapi.yaml yet. Will check out how it works in Rust and maybe create a separate PR.

- implemented client code for the `/tokenize` & `/detokenize` endpoints - added docstring examples

andreaskoepf · 2023-11-30T12:27:51Z

Squashed the changes into a single commit.

benbrandt requested a review from pacman82 November 28, 2023 17:48

pacman82 requested changes Nov 30, 2023

View reviewed changes

andreaskoepf force-pushed the main branch from a08bb32 to 9dd9557 Compare November 30, 2023 10:54

Add tokenize & detokenize to client, fix typos

e2b37f8

- implemented client code for the `/tokenize` & `/detokenize` endpoints - added docstring examples

andreaskoepf force-pushed the main branch from cf33535 to e2b37f8 Compare November 30, 2023 12:26

pacman82 merged commit 65d764b into Aleph-Alpha:main Nov 30, 2023
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tokenize & detokenize to client, fix typos #8

Add tokenize & detokenize to client, fix typos #8

andreaskoepf commented Nov 28, 2023

pacman82 left a comment

pacman82 Nov 30, 2023

andreaskoepf Nov 30, 2023

pacman82 Nov 30, 2023

andreaskoepf Nov 30, 2023

andreaskoepf commented Nov 30, 2023

andreaskoepf commented Nov 30, 2023

Add tokenize & detokenize to client, fix typos #8

Add tokenize & detokenize to client, fix typos #8

Conversation

andreaskoepf commented Nov 28, 2023

pacman82 left a comment

Choose a reason for hiding this comment

pacman82 Nov 30, 2023

Choose a reason for hiding this comment

andreaskoepf Nov 30, 2023

Choose a reason for hiding this comment

pacman82 Nov 30, 2023

Choose a reason for hiding this comment

andreaskoepf Nov 30, 2023

Choose a reason for hiding this comment

andreaskoepf commented Nov 30, 2023

andreaskoepf commented Nov 30, 2023