Rice DSP graduate student Lorenzo Luzi successfully defended his PhD thesis entitled "Overparameterization and double descent in PCA, GANs, and Diffusion models".
Abstract: I study overparameterization in PCA and generative adversarial networks (GANs), and generative models in general. Specifically, I study models that can interpolate the training data. I show that overparameterization can improve generalization performance and accelerate the training process in several contexts. I study the generalization error as a function of latent space dimension and identify two main behaviors, depending on the learning setting. First, I show that overparameterized generative models that learn distributions by minimizing a metric or f-divergence do not exhibit double descent in generalization errors; specifically, all the interpolating solutions achieve the same generalization error. Second, I develop a new pseudo-supervised learning approach for GANs and diffusion models where the training utilizes pairs of fabricated (noise) inputs in conjunction with real output samples. Our pseudo-supervised setting exhibits double descent (and in some cases, triple descent) of generalization errors. I combine pseudo-supervision with overparameterization (i.e., overly large latent space dimension) to accelerate training while performing better, or close to, the generalization performance without pseudo-supervision. While my analysis focuses mostly on linear models, I also apply important insights for improving generalization of nonlinear, multilayer GANs. For the diffusion models, we see that pseudo-supervised samples can improve both performance and convergence speed of the learning algorithm.