`@huggingface/gguf`

A GGUF parser that works on remotely hosted files.

Spec

Spec: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md

Reference implementation (Python): https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/gguf_reader.py

Install

npm install @huggingface/gguf

Usage

Basic usage

import { GGMLQuantizationType, gguf } from "@huggingface/gguf";

// remote GGUF file from https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF
const URL_LLAMA = "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/191239b/llama-2-7b-chat.Q2_K.gguf";

const { metadata, tensorInfos } = await gguf(URL_LLAMA);

console.log(metadata);
// {
//     version: 2,
//     tensor_count: 291n,
//     kv_count: 19n,
//     "general.architecture": "llama",
//     "general.file_type": 10,
//     "general.name": "LLaMA v2",
//     ...
// }

console.log(tensorInfos);
// [
//     {
//         name: "token_embd.weight",
//         shape: [4096n, 32000n],
//         dtype: GGMLQuantizationType.Q2_K,
//     },

//     ... ,

//     {
//         name: "output_norm.weight",
//         shape: [4096n],
//         dtype: GGMLQuantizationType.F32,
//     }
// ]

Reading a local file

// Reading a local file. (Not supported on browser)
const { metadata, tensorInfos } = await gguf(
  './my_model.gguf',
  { allowLocalFile: true },
);

Strictly typed

By default, known fields in metadata are typed. This includes various fields found in llama.cpp, whisper.cpp and ggml.

const { metadata, tensorInfos } = await gguf(URL_MODEL);

// Type check for model architecture at runtime
if (metadata["general.architecture"] === "llama") {

  // "llama.attention.head_count" is a valid key for llama architecture, this is typed as a number
  console.log(model["llama.attention.head_count"]);

  // "mamba.ssm.conv_kernel" is an invalid key, because it requires model architecture to be mamba
  console.log(model["mamba.ssm.conv_kernel"]); // error
}

Disable strictly typed

Because GGUF format can be used to store tensors, we can technically use it for other usages. For example, storing control vectors, lora weights, etc.

In case you want to use your own GGUF metadata structure, you can disable strict typing by casting the parse output to GGUFParseOutput<{ strict: false }>:

const { metadata, tensorInfos }: GGUFParseOutput<{ strict: false }> = await gguf(URL_LLAMA);

Hugging Face Hub

The Hub supports all file formats and has built-in features for GGUF format.

Find more information at: http://hf.co/docs/hub/gguf.

Acknowledgements & Inspirations

https://github.com/hyparam/hyllama by @platypii (MIT license)
https://github.com/ahoylabs/gguf.js by @biw @dkogut1996 @spencekim (MIT license)

🔥❤️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

`@huggingface/gguf`

Spec

Install

Usage

Basic usage

Reading a local file

Strictly typed

Disable strictly typed

Hugging Face Hub

Acknowledgements & Inspirations

Files

README.md

Latest commit

History

README.md

File metadata and controls

@huggingface/gguf

Spec

Install

Usage

Basic usage

Reading a local file

Strictly typed

Disable strictly typed

Hugging Face Hub

Acknowledgements & Inspirations

`@huggingface/gguf`