Covariance Mismatch in Diffusion Models

Diffusion models are usually trained using isotropic noise. Yet, common data distributions are strongly anisotropic. We explain that this covariance mismatch negatively impacts the model in several ways. It leads to the model predicting only the high-variance components at high noise levels and only the low-variance components at low noise levels, requiring a wide range of noise levels during training and inference to model all components accurately. This partition of components across noise levels also prevents smaller timesteps from correcting predictions of larger timesteps, and limits diffusion editing to only low-variance components. We show two approaches to realign the noise and data covariances: whitening the data distribution or coloring the noise distribution. We apply our approach on 2D point distributions and, using a Fourier-based approach, on images. Realigning covariances allows the model to focus more equally on all components, improving editing and enabling fewer noise levels in training. Models trained with realigned covariances offer greater flexibility in the choice of timesteps during inference and can even generate reasonable output while being trained on just a single timestep.

Covariance Mismatch in Diffusion Models

Abstract

Citation