Skip to content

Latest commit

 

History

History
61 lines (43 loc) · 2.56 KB

README.md

File metadata and controls

61 lines (43 loc) · 2.56 KB

Factorized Visual Tokenization and Generation

Zechen Bai 1  Jianxiong Gao 2  Ziteng Gao 1  Pichao Wang 3  Zheng Zhang 3  Tong He 3  Mike Zheng Shou 1 

arXiv 2024

1 Show Lab, National University of Singapore   2 Fudan University  3 Amazon 

arXiv

News

  • [2024-11-28] The code and model will be released soon after internal approval!
  • [2024-11-26] We released our paper on arXiv.

TL;DR

FQGAN is state-of-the-art visual tokenizer with a novel factorized tokenization design, surpassing VQ and LFQ methods in discrete image reconstruction.

Method Overview

FQGAN addresses the large codebook usage issue by decomposing a single large codebook into multiple independent sub-codebooks. By leveraging disentanglement regularization and representation learning objectives, the sub-codebooks learn hierarchical, structured and semantic meaningful representations. FQGAN achieves state-of-the-art performance on discrete image reconstruction, surpassing VQ and LFQ methods.

Comparison with previous visual tokenizers

What has each sub-codebook learned?

Can this tokenizer be used into downstream image generation?

Citation

To cite the paper and model, please use the below:

@article{bai2024factorized,
  title={Factorized Visual Tokenization and Generation},
  author={Bai, Zechen and Gao, Jianxiong and Gao, Ziteng and Wang, Pichao and Zhang, Zheng and He, Tong and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2411.16681},
  year={2024}
}