This repository houses the official PyTorch implementation of the paper titled PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher on ImageNet ranging from 64x64 to 512x512, which is presented at NeurIPS 2024. Our code is heavily based on CTM.
Contacts:
- Dongjun Kim: dongjun@stanford.edu
- Chieh-Hsin (Jesse) Lai: chieh-hsin.lai@sony.com
We train one-step text-to-image generator that is progressively growing in its resolution. For that, we only need low-resolution diffusion models.
- You may find PaGoDA's checkpoints on ImageNet. It contains:
- Stage 1's pretrained Diffusion Models at resolutions 32x32 and 64x64
- Stage 2's PaGoDA's generator at resolutions 32x32 and 64x64
- Stage 3's PaGoDA's generator (1) from resolution 64x64 → 128x128; (2) from resolution 64x64 → 256x256; (3) from resolution 64x64 → 512x512
- You may find the preprocessed data-to-noise datasets here (released soon) for training.
- For Stage2 distillation, run
bash commands/res64to64.sh
- For Stage3 super-resolution, run from
bash commands/64to128.sh
tobash commands/64to512.sh
sequentially.
Please see commands/sampling.sh
for detailed sampling commands.
@article{kim2024pagoda,
title={PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher},
author={Kim, Dongjun and Lai, Chieh-Hsin and Liao, Wei-Hsiang and Takida, Yuhta and Murata, Naoki and Uesaka, Toshimitsu and Mitsufuji, Yuki and Ermon, Stefano},
journal={arXiv preprint arXiv:2405.14822},
year={2024}}