Visual Grounding for Object Questions

Introduces Visual Grounding for Object Questions (VGOQ), a new task for grounding visual evidence or context that supports answering general questions about objects, beyond directly visible elements.

To appear in CVPR 2026
Covariance Mismatch in Diffusion Models

Investigates the covariance mismatch between noise and data in diffusion models and its impact on image generation.

Preprint 2024
Exploiting the Signal-Leak Bias in Diffusion Models

Examines and leverages the signal-leak bias in diffusion models for improved image generation.

WACV 2024
Diffusion in Style

Customizes Stable Diffusion's output style by adapting the initial noise distribution, making style adaptation more sample-efficient and faster.

ICCV 2023
VETIM: Expanding the Vocabulary of Text-to-Image Models only with Text

Expands text-to-image models' vocabulary by learning new token embeddings from textual descriptions alone, without requiring sample images.

BMVC 2023
Estimating Image Depth in the Comics Domain

Estimates depth in comic book images by converting them to natural images and filtering out text to improve accuracy.

WACV 2022
Scene Relighting with Illumination Estimation in the Latent Space

Transfers lighting conditions between images by estimating and manipulating illumination in the latent space of an encoder-decoder network.

arXiv 2020
More works from Image and Visual Representation Lab (IVRL)

Also check more works from our labmates at the Image and Visual Representation Lab (IVRL) at EPFL.

IVRL - EPFL

Covariance Mismatch in Diffusion Models

Image and Visual Representation Lab (IVRL), EPFL, Switzerland

Abstract

Diffusion models are usually trained using isotropic noise. Yet, common data distributions are strongly anisotropic. We explain that this covariance mismatch negatively impacts the model in several ways. It leads to the model predicting only the high-variance components at high noise levels and only the low-variance components at low noise levels, requiring a wide range of noise levels during training and inference to model all components accurately. This partition of components across noise levels also prevents smaller timesteps from correcting predictions of larger timesteps, and limits diffusion editing to only low-variance components. We show two approaches to realign the noise and data covariances: whitening the data distribution or coloring the noise distribution. We apply our approach on 2D point distributions and, using a Fourier-based approach, on images. Realigning covariances allows the model to focus more equally on all components, improving editing and enabling fewer noise levels in training. Models trained with realigned covariances offer greater flexibility in the choice of timesteps during inference and can even generate reasonable output while being trained on just a single timestep.

Citation

Please use the following BibTeX entry to cite our paper:

@article{everaert2024covariancemismatch,
  title    = {{C}ovariance {M}ismatch in {D}iffusion {M}odels},
  author   = {Everaert, Martin Nicolas and S\"usstrunk, Sabine and Achanta, Radhakrishna},
  journal  = {Infoscience preprint Infoscience:20.500.14299/242173},
  month    = {November},
  year     = {2024},
  url      = {https://infoscience.epfl.ch/handle/20.500.14299/242173},
}