Benign Overparameterization in Membership Inference with Early Stopping

Jasper Tan, Daniel LeJeune, Blake Mason, Hamid Javadi, Richard G. Baraniuk, "Benign Overparameterization in Membership Inference with Early Stopping", arXiv:2205.14055.

Does a neural network's privacy have to be at odds with its accuracy? In this work, we study the effects the number of training epochs and parameters have on a neural network's vulnerability to membership inference (MI) attacks, which aim to extract potentially private information about the training data. We first demonstrate how the number of training epochs and parameters individually induce a privacy-utility trade-off: more of either improves generalization performance at the expense of lower privacy. However, remarkably, we also show that jointly tuning both can eliminate this privacy-utility trade-off. Specifically, with careful tuning of the number of training epochs, more overparameterization can increase model privacy for fixed generalization error. To better understand these phenomena theoretically, we develop a powerful new leave-one-out analysis tool to study the asymptotic behavior of linear classifiers and apply it to characterize the sample-specific loss threshold MI attack in high-dimensional logistic regression. For practitioners, we introduce a low-overhead procedure to estimate MI risk and tune the number of training epochs to guard against MI attacks.

DSP PhD Alum AmirAli Aghazadeh Accepts Faculty Position at Georgia Tech

Rice DSP PhD AmirAli Aghazadeh (PhD, 2017) has accepted an assistant professor position at Georgia Tech in the Department of Electrical and Computer Engineering. He has spent the past few years as a postdoc at Stanford University and UC-Berkeley. AmirAli joins DSP PhD alums James McClellan, Douglas Williams, Justin Romberg, Christopher Rozell, Mark Davenport, and Eva Dyer and ECE PhD alum Robert Butera.

DSP Alum Christopher Rozell Named Julian T. Hightower Chair at Georgia Tech

DSP PhD and postdoc alum Christopher Rozell has been named the Julian T. Hightower Chair at Georgia Tech. Chris has had a storied career so far. For his research, he has received the NSF CAREER Award and Sigma Xi Young Faculty Research Award and been named one of six international recipients of the James S. McDonnell Foundation 21st Century Science Initiative Scholar Award. For his teaching, he has received the Class of 1940 W. Howard Ector Outstanding Teacher Award and the CTL/BP America Junior Faculty Teaching Excellence Award. Previously, Chris held the Demetrius T. Paris Junior Professorship. Chris's research interests lie at the intersection of computational neuroscience and signal processing and aim to understand how neural systems organize and process sensory information.

DSP Faculty Member Richard Baraniuk Elected to the NAE

Richard Baraniuk has been elected to the National Academy of Engineering in recognition of his contributions to engineering "for the development and broad dissemination of open educational resources and for foundational contributions to compressive sensing." Election to the National Academy of Engineering is among the highest professional distinctions accorded to an engineer. More from Rice News.

Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference

Jasper Tan, Blake Mason, Hamid Javadi, Richard G. Baraniuk, "Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference", arXiv:2202.01243.

A surprising phenomenon in modern machine learning is the ability of a highly overparameterized model to generalize well (small error on the test data) even when it is trained to memorize the training data (zero error on the training data). This has led to an arms race towards increasingly overparameterized models (c.f., deep learning). In this paper, we study an underexplored hidden cost of overparameterization: the fact that overparameterized models are more vulnerable to privacy attacks, in particular the membership inference attack that predicts the (potentially sensitive) examples used to train a model. We significantly extend the relatively few empirical results on this problem by theoretically proving for an overparameterized linear regression model with Gaussian data that the membership inference vulnerability increases with the number of parameters. Moreover, a range of empirical studies indicates that more complex, nonlinear models exhibit the same behavior. Finally, we study different methods for mitigating such attacks in the overparameterized regime, such as noise addition and regularization, and conclude that simply reducing the parameters of an overparameterized model is an effective strategy to protect it from membership inference without greatly decreasing its generalization error.

DSP Faculty Member Richard Baraniuk to Present the AMS Josiah Willard Gibbs Lecture

Richard Baraniuk will present the 2023 AMS Josiah Willard Gibbs Lecture at the Joint Mathematics Meeting in Boston, Massachusetts in January 2023. The first AMS Josiah Willard Gibbs Lecture was given in 1923. This public lecture is one of the signature events in the Society’s calendar. Previous speakers have included Albert Einstein, Vannevar Bush, John von Neumann, Norbert Wiener, Kurt Gödel, Hermann Weyl, Eugene Wigner, Donald Knuth, Herb Simon, David Mumford, Ingrid Daubechies, and Claude Shannon.

Rice/UMass Team wins Department of Education IES/NCES Automated Scoring Challenge

Congratulations to Rice DSP PhD students Jack Wang and Lucy Liu and Rice DSP PhD alum Andrew Lan (now an assistant professor of computer science at UMass-Amherst) on winning the Department of Education IES/NCES Automated Scoring Challenge!

Rice DSP to Co-Organize the Second Workshop on the Theory of Overparameterized Machine Learning (TOPML) 2022

The contemporary practice in deep learning has challenged conventional approaches to machine learning. Specifically, deep neural networks are highly overparameterized models with respect to the number of data examples and are often trained without explicit regularization. Yet they achieve state-of-the-art generalization performance. Understanding the overparameterized regime requires new theory and foundational empirical studies. A prominent recent example is the "double descent" behavior of generalization errors that was discovered empirically in deep learning and then very recently analytically characterized for linear regression and related problems in statistical learning.

The goal of this workshop is to cross-fertilize the wide range of theoretical perspectives that will be required to understand overparameterized models, including the statistical, approximation theoretic, and optimization viewpoints. The workshop concept is the first of its kind in this space and enables researchers to dialog about not only cutting edge theoretical studies of the relevant phenomena but also empirical studies that characterize numerical behaviors in a manner that can inspire new theoretical studies.

Invited speakers:

Caroline Uhler, MIT
Francis Bach, ‌École Normale Sup‌érieure
Lenka Zdeborova, EPFL
Vidya Muthukumar, Georgia Tech
Andrea Montanari, Stanford
Daniel Hsu, Columbia University
Jeffrey Pennington, Google Research
Edgar Dobriban, University of Pennsylvania

Organizing committee:

Yehuda Dar, Rice University
Mikhail Belkin, UC San Diego
Gitta Kutyniok, LMU Munich
Ryan Tibshirani, Carnegie Mellon University
Richard Baraniuk, Rice University

Workshop dates: April 5-6, 2022
Virtual event
Free registration
Workshop website: https://topml.rice.edu
Abstract submission deadline: February 17, 2022
Call for Contributions available at https://topml.rice.edu/call-for-contributions-2022/

DSP Faculty Member Richard Baraniuk Awarded the Harold W. McGraw, Jr. Prize in Education

Richard G. Baraniuk, the C. Sidney Burrus Professor of Electrical and Computer Engineering (ECE) and founding director of OpenStax, Rice’s educational technology initiative, has received the Harold W. McGraw, Jr. Prize in Education. The award is given annually by the Harold W. McGraw, Jr. Family Foundation and the University of Pennsylvania Graduate School of Education and goes to “outstanding individuals whose accomplishments are making a difference in the lives of students.” Baraniuk is one of the founders of the Open Education movement that promotes the use of free and open-source-licensed Open Educational Resources. He is founder and director of OpenStax (formerly Connexions), a non-profit educational and scholarly publishing project he founded in 1999 to bring textbooks and other learning materials into the digital age.

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

Y. Dar, V. Muthukumar, R. G. Baraniuk, "A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning", arXiv:2109.02355.

The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy. Such interpolation of noisy data is traditionally associated with detrimental overfitting, and yet a wide range of interpolating models -- from simple linear models to deep neural networks -- have recently been observed to generalize extremely well on fresh test data. Indeed, the recently discovered double descent phenomenon has revealed that highly overparameterized models often improve over the best underparameterized model in test performance.

Understanding learning in this overparameterized regime requires new theory and foundational empirical studies, even for the simplest case of the linear model. The underpinnings of this understanding have been laid in very recent analyses of overparameterized linear regression and related statistical learning tasks, which resulted in precise analytic characterizations of double descent. This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.