-
Notifications
You must be signed in to change notification settings - Fork 12
Model format
Peter Major edited this page Jun 16, 2023
·
2 revisions
Unpaint accepts Stable Diffusion models in the ONNX format. Each model must contain the following files:
Path | Role |
---|---|
feature_extractor / preprocessor_config.json |
Configuration for feature extraction |
safety_checker / model.onnx |
Model for the safety check |
scheduler / scheduler_config.json |
Configuration for denoising scheduler |
text_encoder / model.onnx |
Model for text encoding |
tokenizer / merges.txt, special_tokens_map.json, tokenizer_config.json, vocab.json |
Configuration for text tokenization |
unet / model.onnx |
Model for denoising |
vae_decoder / model.onnx |
Model for VAE decoding |
vae_encoder / model.onnx |
Model for VAE encoding |
Checks if the image is safe for work (SFW).
Input:
-
float16 clip_input[batch, channels, height, width]
: the input image in color space, usually 224 x 224 pixels, colors scaled to the range 0 .. 1 then normalized, planar color channels -
float16 images[batch, height, width, channels]
: the input image in color space, original size, color scaled to the range 0 .. 1
Output:
-
float16 out_images[batch, height, width, channels]
: the input image, if safe, otherwise a black image -
boolean has_nsfw_concepts[batch]
: a boolean value for each input image in the batch, true if the image is unsafe otherwise false
Encodes tokenized text as an embedding.
Input:
-
int32 input_ids[batch, sequence]
: the input tokens, usually have the dimension of batch size * 77
Output:
-
float16 last_hidden_state[batch, sequence, 768]
: the text embedding (aka. last hidden state)
Denoises images.
Input:
-
float16 sample[batch, channels, height, width]
: the image to denoise in latent space -
float16 timestep[batch]
: the timestep for denoising -
float16 encoder_hidden_state[batch, sequence, 768]
: the text embedding (aka. last hidden state)
Output:
-
float16 out_sample[batch, channels, height, width]
: the denoised image in latent space
Converts images from latent to color space.
Input:
-
float16 latent_sample[batch, channels, height, width]
: an image in latent space
Output:
-
float16 sample[batch, channels, height, width]
: an image in color space
Converts images from color to latent space.
Input:
-
float16 sample[batch, channels, height, width]
: an image in color space
Output:
-
float16 latent_sample[batch, channels, height, width]
: an image in latent space