Zechen Bai 1 Jianxiong Gao 2 Ziteng Gao 1 Pichao Wang 3 Zheng Zhang 3 Tong He 3 Mike Zheng Shou 1
arXiv 2024
1 Show Lab, National University of Singapore 2 Fudan University 3 Amazon
News
- [2024-11-28] The code and model will be released soon after internal approval!
- [2024-11-26] We released our paper on arXiv.
FQGAN is state-of-the-art visual tokenizer with a novel factorized tokenization design, surpassing VQ and LFQ methods in discrete image reconstruction.
FQGAN addresses the large codebook usage issue by decomposing a single large codebook into multiple independent sub-codebooks. By leveraging disentanglement regularization and representation learning objectives, the sub-codebooks learn hierarchical, structured and semantic meaningful representations. FQGAN achieves state-of-the-art performance on discrete image reconstruction, surpassing VQ and LFQ methods.
To cite the paper and model, please use the below:
@article{bai2024factorized,
title={Factorized Visual Tokenization and Generation},
author={Bai, Zechen and Gao, Jianxiong and Gao, Ziteng and Wang, Pichao and Zhang, Zheng and He, Tong and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2411.16681},
year={2024}
}