Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Unigram tokenizer needed by T5 and FLAN-T5 model families #8089

Merged
merged 5 commits into from
Jun 25, 2024

Commits on Jun 24, 2024

  1. llama : add T5 model architecture, tensors and model header parameters

    llama : add Unigram tokenizer
    sszymczy committed Jun 24, 2024
    Configuration menu
    Copy the full SHA
    c2c799c View commit details
    Browse the repository at this point in the history
  2. llama : add handling of byte tokens in UGM tokenizer (same as in SPM)

    llama : fix preventing crashes when precompiled_charsmap is not present
    sszymczy committed Jun 24, 2024
    Configuration menu
    Copy the full SHA
    f4c03c0 View commit details
    Browse the repository at this point in the history

Commits on Jun 25, 2024

  1. llama : replace allocated precompiled_charsmap buffer with std::vecto…

    …r to avoid memory leak
    sszymczy committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    87b7dd2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    21d3684 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f23ff91 View commit details
    Browse the repository at this point in the history