题目:Effect Size Heterogeneity Matters in High Dimensions
主讲人:宾夕法尼亚大学 苏炜杰助理教授
主持人:西南财经大学统计学院 常晋源教授
时间:2020年6月5日(星期五)10:00-11:20
直播平台及会议ID:腾讯会议,872 707 851
报告摘要:
In high-dimensional linear regression, would increasing true effect sizes always lead to better model selection, while maintaining the other conditions unchanged (such as fixing sparsity)? In this paper, we answer this question in the negative in a certain regime of sparsity for the Lasso method, through introducing a new notion we term effect size heterogeneity. Roughly speaking, a regression coefficient vector has high effect size heterogeneity if the nonzero entries of this vector have significantly different magnitudes, and vice versa. From the perspective of this new measure, we prove that in the regime of linear sparsity, false and true positive rates achieve the optimal trade-off uniformly along the Lasso path when this measure is maximal in the sense that all nonzero effect sizes have very different magnitudes, and the worst-case trade-off is achieved when it is minimum in the sense that all nonzero effect sizes are about equal. Moreover, we demonstrate that the Lasso path produces an optimal ranking of explanatory variables in terms of the rank of the first false variable when the effect size heterogeneity is maximum, and vice versa. Taken together, the two findings suggest that effect size heterogeneity shall serve as a complementary measure to the sparsity of regression coefficients in the analysis of high- dimensional regression problems. In the case of low effect size heterogeneity, variables with comparable effect sizes—no matter how large they are—metaphorically, would compete with each other along the Lasso path, leading to the degradation of the Lasso in terms of variable selection. Our proofs use techniques from approximate message passing theory as well as a novel argument for estimating the rank of the first false variable.
主讲人简介:
Weijie Su is an Assistant Professor in the Wharton Statistics Department at the University of Pennsylvania, where he co-directs the Penn Research in Machine Learning. Prior to joining Penn, he received his Ph.D. in Statistics from Stanford University in 2016 and his B.S. in Mathematics from Peking University in 2011. His research interests span high-dimensional statistics, mathematical optimization, privacy-preserving data analysis, multiple hypothesis testing, and deep learning theory. He is a recipient of the Stanford Theodore W. Anderson Dissertation Award in 2016, an NSF CAREER Award in 2019, an Alfred P. Sloan Research Fellowship and a Facebook Faculty Research Award in 2020.