Research

Contributions

My research develops the theoretical and methodological underpinnings of high-dimensional problems in statistics and machine learning. During my Ph.D., I uncovered that classical likelihood-based inference techniques yield highly inaccurate uncertainty measures in moderate to high dimensions. This renders p-values/confidence intervals from standard statistical packages unreliable (see illustrations here). To remedy this, I introduced a modern maximum likelihood theory (with focus on generalized linear models) that provides valid inference in high dimensions, resolving the issues with classical procedures. Since then, my focus has centered on dependent data, causal inference, and modern machine learning (ML).  

In recent years, my work has introduced an eigenvalue-based framework for high-dimensional inference under structured dependence, and potentially heavy-tailed covariates. Additionally, I have developed novel central limit theorems that quantify uncertainty in two-stage estimation, with applications in causal inference. On the ML front,  I have established precise high-dimensional theories that capture the prediction behavior of popular ML algorithms and classifiers. The technical contributions in my work rely on insights from high-dimensional probability, optimization theory, and statistical physics.

Preprints and Publications

Predictive Inference in Multi-Environment Scenarios (2024+) In Review at Statistical Science [arXiv] (with John Duchi, Suyash Gupta and Kuanhao Jiang).

Universality in block dependent linear models with applications to nonparametric regression (2023+) In Review at IEEE Transactions on Information Theory [arXiv] (with Samriddha Lahiry).

Spectrum-Aware Adjustment: A New Debiasing Framework with Applications to Principal Components Regression (2023+) In Submission [talk][arXiv] (with Yufan Li).

High-dimensional Asymptotics of Langevin Dynamics in Spiked Matrix Models (2023) Information and Inference: A Journal of the IMA, Volume 12, Issue 4, December 2023, Pages 2720–2752 [arXiv] [journal] (with Tengyuan Liang and Subhabrata Sen).

A New Central Limit Theorem for the Augmented IPW Estimator: Variance Inflation, Cross-fit Covariance and Beyond (2022+) The Annals of Statistics (Major Revision) [talk] [arXiv] (with Kuanhao Jiang, Rajarshi Mukherjee and Subhabrata Sen). Kuanhao Jiang won the 2022 New England Statistical Society’s Student Research Award for this work.

Multi-study boosting: Theoretical Considerations for Merging vs. Ensembling (2022+) In Review at JMLR. [arXiv] (with Cathy Shyr, Giovanni Parmigiani and Prasad Patil).

A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models. Neural Information Processing Systems (NeurIPS) 2022. [NeurIPS version] (with Lijia Zhou, Frederic Koehler, Danica J. Sutherland and Nathan Srebro).

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers. The Annals of Statistics 50.3 (2022): 1669-1695. [talk] [arXiv] [journal] (with Tengyuan Liang).

The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary CovarianceBernoulli, 28.3 (2022): 1835-1861. [arXivarXiv about The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary Covariance] [journal] [codecode about The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary Covariance] (with Qian Zhao and Emmanuel Candès).

Representation via Representations: Domain generalization via Adversarially Learned Invariant Representations (2020) [arXiv] (with Zhun Deng, Frances Ding, Cynthia Dwork, Rachel Hong, Giovanni Parmigiani and Prasad Patil).

Abstracting Fairness: Oracles, Metrics, and InterpretabilityFoundations of Responsible Computing, volume LIPIcs, Volume 156, FORC 2020. [conferenceconference about Abstracting Fairness: Oracles, Metrics, and Interpretability] (with Cynthia Dwork, Christina Ilvento and Guy Rothblum).

The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regressionThe Annals of Statistics, 48, no. 1 (2020): 27-42. [arXiv about The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression] [journal about The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regressi] (with Emmanuel Candès).

A modern maximum-likelihood theory for high-dimensional logistic regressionProceedings of the National Academy of Sciences 116.29 (2019): 14516-14525. [Supplement for A modern maximum-likelihood theory for high-dimensional logistic regressi] [talk about A modern maximum-likelihood theory for high-dimensional logistic regression] [arXiv about A modern maximum-likelihood theory for high-dimensional logistic regression] [journal about A modern maximum-likelihood theory for high-dimensional logistic regression] (with Emmanuel Candès).

The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-squareProbability Theory and Related Fields 175.1 (2019): 487-558. [Supplement for The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square] [talk about The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square] [arXiv about The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square] [journal about The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-squar] (with Yuxin Chen and Emmanuel Candès).

Modeling bimodal discrete data using Conway- Maxwell-Poisson mixture modelsJournal of Business & Economic Statistics 33.3 (2015): 352-365. [arXiv][journal about Modeling bimodal discrete data using Conway- Maxwell-Poisson mixture models] (with Galit Shmueli, Smarajit Bose and Paromita Dubey).

Fitting COM-Poisson mixtures to bimodal count data. Proceedings of the 2013 International Conference on Information, Operations Management and Statistics (ICIOMS 2013). Winner of Best Paper Award (with Smarajit Bose, Galit Shmueli and Paromita Dubey).

Ph.D. Thesis

A modern maximum likelihood theory for high-dimensional logistic regression (2019). Recipient of the Theodore W. Anderson Theory of Statistics Dissertation Award.