Research Interests
- High-Dimensional Statistics
- Machine Learning and AI Theory
- Learning under Overparametrization
- Learning under Heterogeneity and Distribution Shifts
Publications and Preprints
Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression (2025+) In Submission [arXiv] (with Anvit Garg and Sohom Bhattacharya)
Multi-Environment GLAMP: Approximate Message Passing for Transfer Learning with Applications to Lasso-based Estimators (2025+) In Review at the IEEE Transactions on Information Theory [arXiv] (with Longlin Wang, Yanke Song and Kuanhao Jiang)
Spectrum-Aware Debiasing: A Modern Inference Framework with Applications to Principal Components Regression (2025) The Annals of Statistics (to appear) [talk][code][arXiv] (with Yufan Li). Yufan Li won the Dempster Award for this work.
Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling (2025) NeurIPS (Spotlight, 3% acceptance rate for this track) [arXiv] (with Yufan Li).
ROTI-GCV: Generalized Cross-Validation for right-ROTationally Invariant Data (2025) AISTATS [code][arXiv] (with Kevin Luo and Yufan Li). Kevin Luo won the Thomas Temple Hoopes Prize for this work.
Predictive Inference in Multi-Environment Scenarios (2025) Statistical Science, Vol. 40, No. 3, 392-416 [code][arXiv][journal] (with John Duchi, Suyash Gupta and Kuanhao Jiang).
A New Central Limit Theorem for the Augmented IPW Estimator: Variance Inflation, Cross-fit Covariance and Beyond (2025) The Annals of Statistics, 53(2), pp.647-675 [talk][code][arXiv][journal](with Kuanhao Jiang, Rajarshi Mukherjee and Subhabrata Sen). Kuanhao Jiang won the New England Statistical Society’s Student Research Award for this work.
Generalization Error of Min-Norm Interpolators in Transfer Learning (2024+) Reject & Resubmit at the Annals of Statistics [code][arXiv] (with Yanke Song and Sohom Bhattacharya).
HEDE: Heritability estimation in high dimensions by Ensembling Debiased Estimators (2024+) In Review at The Annals of Applied Statistics [code][arXiv] (with Yanke Song and Xihong Lin).
Universality in block dependent linear models with applications to nonparametric regression (2024) IEEE Transactions on Information Theory, Volume 70, Issue 12, December 2024, Pages 8975-9000 [arXiv] [journal](with Samriddha Lahiry).
High-dimensional Asymptotics of Langevin Dynamics in Spiked Matrix Models (2023) Information and Inference: A Journal of the IMA, Volume 12, Issue 4, December 2023 Pages 2720–2752 [arXiv] [journal] (with Tengyuan Liang and Subhabrata Sen).
Multi-study boosting: Theoretical Considerations for Merging vs. Ensembling (2022+) In Review at the Electronic Journal of Statistics [arXiv] (with Cathy Shyr, Giovanni Parmigiani and Prasad Patil).
A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers. The Annals of Statistics 50.3 (2022): 1669-1695. [talk] [arXiv] [journal] (with Tengyuan Liang).
A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models. Neural Information Processing Systems (NeurIPS) 2022. [NeurIPS version] (with Lijia Zhou, Frederic Koehler, Danica J. Sutherland and Nathan Srebro).
The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary Covariance. Bernoulli, 28.3 (2022): 1835-1861. [arXivarXiv about The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary Covariance] [journal] [codecode about The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary Covariance] (with Qian Zhao and Emmanuel Candès).
Representation via Representations: Domain generalization via Adversarially Learned Invariant Representations (2020) [arXiv] (with Zhun Deng, Frances Ding, Cynthia Dwork, Rachel Hong, Giovanni Parmigiani and Prasad Patil).
Abstracting Fairness: Oracles, Metrics, and Interpretability. Foundations of Responsible Computing, volume LIPIcs, Volume 156, FORC 2020. [conferenceconference about Abstracting Fairness: Oracles, Metrics, and Interpretability] (with Cynthia Dwork, Christina Ilvento and Guy Rothblum).
The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression. The Annals of Statistics, 48, no. 1 (2020): 27-42. [arXiv about The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression] [journal about The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regressi] (with Emmanuel Candès).
A modern maximum-likelihood theory for high-dimensional logistic regression. Proceedings of the National Academy of Sciences 116.29 (2019): 14516-14525. [Supplement for A modern maximum-likelihood theory for high-dimensional logistic regressi] [talk about A modern maximum-likelihood theory for high-dimensional logistic regression][code][arXiv about A modern maximum-likelihood theory for high-dimensional logistic regression] [journal about A modern maximum-likelihood theory for high-dimensional logistic regression] (with Emmanuel Candès).
The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square. Probability Theory and Related Fields 175.1 (2019): 487-558. [Supplement for The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square] [talk about The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square] [arXiv about The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square] [journal about The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-squar] (with Yuxin Chen and Emmanuel Candès).
Modeling bimodal discrete data using Conway- Maxwell-Poisson mixture models. Journal of Business & Economic Statistics 33.3 (2015): 352-365. [arXiv][journal about Modeling bimodal discrete data using Conway- Maxwell-Poisson mixture models] (with Galit Shmueli, Smarajit Bose and Paromita Dubey).
Fitting COM-Poisson mixtures to bimodal count data. Proceedings of the 2013 International Conference on Information, Operations Management and Statistics (ICIOMS 2013). Winner of Best Paper Award (with Smarajit Bose, Galit Shmueli and Paromita Dubey).
Ph.D. Thesis
A modern maximum likelihood theory for high-dimensional logistic regression (2019). Recipient of the Theodore W. Anderson Theory of Statistics Dissertation Award.