GitHub - showlab/FQGAN: FQGAN: Factorized Visual Tokenization and Generation

Factorized Visual Tokenization and Generation

Zechen Bai ¹ Jianxiong Gao ² Ziteng Gao ¹ Pichao Wang ³ Zheng Zhang ³ Tong He ³ Mike Zheng Shou ¹

arXiv 2024

¹ Show Lab, National University of Singapore ² Fudan University ³ Amazon

News

[2024-11-28] The code and model will be released soon after internal approval!
[2024-11-26] We released our paper on arXiv.

TL;DR

FQGAN is state-of-the-art visual tokenizer with a novel factorized tokenization design, surpassing VQ and LFQ methods in discrete image reconstruction.

Method Overview

FQGAN addresses the large codebook usage issue by decomposing a single large codebook into multiple independent sub-codebooks. By leveraging disentanglement regularization and representation learning objectives, the sub-codebooks learn hierarchical, structured and semantic meaningful representations. FQGAN achieves state-of-the-art performance on discrete image reconstruction, surpassing VQ and LFQ methods.

Comparison with previous visual tokenizers

What has each sub-codebook learned?

Can this tokenizer be used into downstream image generation?

Citation

To cite the paper and model, please use the below:

@article{bai2024factorized,
  title={Factorized Visual Tokenization and Generation},
  author={Bai, Zechen and Gao, Jianxiong and Gao, Ziteng and Wang, Pichao and Zhang, Zheng and He, Tong and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2411.16681},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Factorized Visual Tokenization and Generation

TL;DR

Method Overview

Comparison with previous visual tokenizers

What has each sub-codebook learned?

Can this tokenizer be used into downstream image generation?

Citation

About

Releases

Packages

showlab/FQGAN

Folders and files

Latest commit

History

Repository files navigation

Factorized Visual Tokenization and Generation

TL;DR

Method Overview

Comparison with previous visual tokenizers

What has each sub-codebook learned?

Can this tokenizer be used into downstream image generation?

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages