Covariance Mismatch in Diffusion Models

Everaert, Martin Nicolas; Süsstrunk, Sabine; Achanta, Radhakrishna

Covariance Mismatch in Diffusion Models

Martin Nicolas Everaert, Sabine Süsstrunk, Radhakrishna Achanta

Image and Visual Representation Lab (IVRL), EPFL, Switzerland

EPFL Infoscience Preprint (pdf) Preprint with Appendix (pdf) Models (HuggingFace)

Abstract

Diffusion models are usually trained using isotropic noise. Yet, common data distributions are strongly anisotropic. We explain that this covariance mismatch negatively impacts the model in several ways. It leads to the model predicting only the high-variance components at high noise levels and only the low-variance components at low noise levels, requiring a wide range of noise levels during training and inference to model all components accurately. This partition of components across noise levels also prevents smaller timesteps from correcting predictions of larger timesteps, and limits diffusion editing to only low-variance components. We show two approaches to realign the noise and data covariances: whitening the data distribution or coloring the noise distribution. We apply our approach on 2D point distributions and, using a Fourier-based approach, on images. Realigning covariances allows the model to focus more equally on all components, improving editing and enabling fewer noise levels in training. Models trained with realigned covariances offer greater flexibility in the choice of timesteps during inference and can even generate reasonable output while being trained on just a single timestep.

Citation

Please use the following BibTeX entry to cite our paper:

@article{everaert2024covariancemismatch,
  title    = {{C}ovariance {M}ismatch in {D}iffusion {M}odels},
  author   = {Everaert, Martin Nicolas and S\"usstrunk, Sabine and Achanta, Radhakrishna},
  journal  = {Infoscience preprint Infoscience:20.500.14299/242173},
  month    = {November},
  year     = {2024},
  url      = {https://infoscience.epfl.ch/handle/20.500.14299/242173},
}

More Works

Visual Grounding for Object Questions

Covariance Mismatch in Diffusion Models

Exploiting the Signal-Leak Bias in Diffusion Models

Diffusion in Style

VETIM: Expanding the Vocabulary of Text-to-Image Models only with Text

Estimating Image Depth in the Comics Domain

Scene Relighting with Illumination Estimation in the Latent Space

More works from Image and Visual Representation Lab (IVRL)

Covariance Mismatch in Diffusion Models

Abstract

Citation