Skip to content
/ FQGAN Public

FQGAN: Factorized Visual Tokenization and Generation

Notifications You must be signed in to change notification settings

showlab/FQGAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation


Factorized Visual Tokenization and Generation

Zechen Bai 1  Jianxiong Gao 2  Ziteng Gao 1  Pichao Wang 3  Zheng Zhang 3  Tong He 3  Mike Zheng Shou 1 

arXiv 2024

1 Show Lab, National University of Singapore   2 Fudan University  3 Amazon 

arXiv

News

  • [2024-11-28] The code and model will be released soon after internal approval!
  • [2024-11-26] We released our paper on arXiv.

TL;DR

FQGAN is state-of-the-art visual tokenizer with a novel factorized tokenization design, surpassing VQ and LFQ methods in discrete image reconstruction.

Method Overview

FQGAN addresses the large codebook usage issue by decomposing a single large codebook into multiple independent sub-codebooks. By leveraging disentanglement regularization and representation learning objectives, the sub-codebooks learn hierarchical, structured and semantic meaningful representations. FQGAN achieves state-of-the-art performance on discrete image reconstruction, surpassing VQ and LFQ methods.

Comparison with previous visual tokenizers

What has each sub-codebook learned?

Can this tokenizer be used into downstream image generation?

Citation

To cite the paper and model, please use the below:

@article{bai2024factorized,
  title={Factorized Visual Tokenization and Generation},
  author={Bai, Zechen and Gao, Jianxiong and Gao, Ziteng and Wang, Pichao and Zhang, Zheng and He, Tong and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2411.16681},
  year={2024}
}

About

FQGAN: Factorized Visual Tokenization and Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published