Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tokenize & detokenize to client, fix typos #8

Merged
merged 1 commit into from
Nov 30, 2023

Conversation

andreaskoepf
Copy link
Contributor

Added an implementation for the tasks tokenize & detokenize analogous to the existing other API requests

While the Task + Output structs are easily extensible it feels a bit cumbersome for simple requests like tokenize/detokenize. Maybe it would be worth to consider offering simpler functions like async fn detokenize(token_ids: &Vec<u32>) -> Result<String, Error>.

Copy link
Contributor

@pacman82 pacman82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, thanks for the contribution. I have a few nitpicks, please toch them up, before merging.

Design feedback: The API also allows you to fetch a tokenizer and instantiate it locally. Might this fit your usecase even better?

src/detokenization.rs Outdated Show resolved Hide resolved
src/detokenization.rs Outdated Show resolved Hide resolved
}

#[derive(Serialize, Debug)]
struct DetokenizationRequest<'a> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this to DetokenizationBody.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I renamed it to BodyDetokenization and ResponseDetokenization in the spirit of the existing BodyCompletion and ResponseCompletion.

@@ -215,6 +219,28 @@ impl Client {
.output_of(&task.with_model(model), how)
.await
}

pub async fn tokenize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minimal example with a comment why you want to call this would be nice. Not required though for me to merge the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I added docstring examples for tokenize()/detokenize().

src/tokenization.rs Outdated Show resolved Hide resolved
src/tokenization.rs Outdated Show resolved Hide resolved
@andreaskoepf
Copy link
Contributor Author

Hello, thanks for the contribution. I have a few nitpicks, please toch them up, before merging.

Thanks a lot for your review! Hope I could address your comments. I will add docstrings for tokenize/detokenize.

Design feedback: The API also allows you to fetch a tokenizer and instantiate it locally. Might this fit your usecase even better?

That's great. Just saw that the python client supports this (aleph_alpha_client.py#L573C55-L573C79) by fetching /models/{model}/tokenizer. BTW This tokenizer-endpoint isn't mentioned in openapi.yaml yet. Will check out how it works in Rust and maybe create a separate PR.

- implemented client code for the `/tokenize` & `/detokenize` endpoints
- added docstring examples
@andreaskoepf
Copy link
Contributor Author

Squashed the changes into a single commit.

@pacman82 pacman82 merged commit 65d764b into Aleph-Alpha:main Nov 30, 2023
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants