This repository provides a TensorFlow 2 implementation of MRIC based on:
Abstract
By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. From a single compressed representation, the receiver can decide to either reconstruct a low mean squared error reconstruction that is close to the input, a realistic reconstruction with high perceptual quality, or anything in between. With our method, we set a new state-of-the-art in distortion-realism, pushing the frontier of achievable distortion-realism pairs, i.e., our method achieves better distortions at high realism and better realism at low distortion than ever before.
01/11/2024
- Initial release of this project
The image below (left) is taken from the CLIC 2020 test set, external to the training set. The right image is its corresponding reconstruction, when using MRIC (
CLIC 2020: ad24 | Bits per pixel: 0.1501 (59kB)
More example reconstructions can be found here.
We trained two models using
In this section we quantitatively compare the performance of MRIC (reimpl) to the officially reported numbers. We add VTM-20.0 (state-of-the-art for non-learned codecs) and HiFiC (long-standing previous state-of-the art for generative image compression) for the sake of completeness. The FID/256-computation is based on Torch-Fidelity, similar to MS-ILLM, as common in the literature.
We generally find that MRIC (reimpl) tends to favor low FID scores over high PSNR values.
For MRIC (
We leave the exploration for better trade-offs to future work.
For MRIC reimpl we use
$ git clone https://github.com/Nikolai10/MRIC.git
This project has been developed using docker; we recommend using the tensorflow:2.14.0-gpu-jupyter docker image, which uses tfc==2.14.0 by default (latest).
A tensorflow/ docker installation guideline is provided here.
Please have a look at the example Colab notebook for more information.
The general goal of this project is to provide an exact reimplementation of MRIC. In this section we highlight some minor technical deviations from the official work that we have made to achieve a better trade-off between stability and performance for our particular setup.
Official | Reimplementation | |
---|---|---|
Data | proprietary dataset | Open Images |
optimization strategy | end-to-end from scratch | multi-stage training (similar to HiFiC, Sec. A6) |
optimization steps | 3M | 2.3M = 2M (stage 1) + 0.3M (stage 2) |
higher |
|
- |
learning rate decay | 1e-4 -> 1e-5 for the last 15% steps | 1e-4 -> 1e-5 for the last 15% steps of stage 1; we use a constant learning rate for stage 2 (1e-4) |
entropy model | small variant of ChARM (10 slices) | TBTC-inspired variant of ChARM (see Figure 12) |
Note that the entropy model probably plays a minor role in the overall optimization procedure; at the time of development, we simply did not have access to the official ChARM configuration.
If you find better hyper-parameters, please share them with the community.
All pre-trained models (
- add sophisticated data pre-processing methods (e.g. random resized cropping, random horizontal flipping), see _get_dataset (HiFiC) for some inspiration.
- explore different hyper-parameters; can we obtain a single model that obtains both state-of-the-art results for distortion (MRIC
$\beta=0.0$ ) and perception (MRIC$\beta=2.56$ )?
Note that we have taken great care to follow the official works - e.g. if you are already familiar with HiFiC, you will find that hific_tf2 follows the exact same structure (similar applies to compare_gan_tf2, amtm2023.py).
res
├── data/ # e.g. training data; LPIPS weights etc.
├── doc/ # addtional resources
├── eval/ # sample images + reconstructions
├── train_amtm2023/ # model checkpoints + tf.summaries
├── amtm2023/ # saved model
src
├── compare_gan_tf2/ # partial TF 2 port of compare_gan (mirrors structure)
├── arch_ops.py # building blocks used in PatchGAN
├── loss_lib.py # non_saturating GAN loss
├── utils.py # convenient utilities
├── hific_tf2/ # partial TF 2 port of HiFiC (mirrors structure)
├── archs.py # PatchGAN discriminator
├── helpers.py # LPIPS downloader
├── model.py # perceptual loss
├── amtm2023.py # >> core of this repo <<
├── config.py # configurations
├── elic.py # ELIC transforms based on VCT
├── fourier_cond.py # Fourier conditioning
├── synthesis.py # conditional synthesis transform
This project is based on:
- TensorFlow Compression (TFC), a TF library dedicated to data compression. Particularly, we base our work on the well known MS2020 and HiFiC, while closely following the official structure.
- VCT: A Video Compression Transformer - we make use of the ELIC analysis transform.
- NeRF: Neural Radiance Fields - we make use of the Fourier feature computation.
- compare_gan, a TF 1 library dedicated to GANs - we translate some functionality to TF 2.
We thank the authors for providing us with the official evaluation points as well as helpful insights.