DB视讯(中国)学术报告第41期-数据科学与商业智能联合DB视讯(中国)
  • DB视讯(中国)

    您当前的位置: 首 页 > 学术活动 > 学术报告 > 正文

    DB视讯(中国)学术报告第41期

    题目:Adaptive estimation in multivariate response regression with hidden variables

    主讲人:康奈尔大学 宁洋助理教授

    主持人:统计学院 常晋源教授

    时间:2021312日(周五)上午10:00-11:00

    直播平台及会议ID:腾讯会议,745 780 425

    报告摘要:

    A prominent concern of scientific investigators is the presence of unobserved hidden variables in association analysis. Ignoring hidden variables often yields biased statistical results and misleading scientific conclusions. Motivated by this practical issue, this paper studies the multivariate response regression with hidden variables, Y = (Ψ ) T X + (B ) T Z + E, where Y R m is the response vector, X R p is the observable feature, Z R K represents the vector of unobserved hidden variables, possibly correlated with X, and E is an independent error. The number of hidden variables K is unknown and both m and p are allowed, but not required, to grow with the sample size n. Though Ψ is shown to be non-identifiable due to the presence of hidden variables, we propose to identify the projection of Ψ onto the orthogonal complement of the row space of B , denoted by Θ . The quantity (Θ ) T X measures the effect of X on Y that cannot be explained through the hidden variables, and thus Θ is treated as the parameter of interest. Motivated by the identifiability proof, we propose a novel estimation algorithm for Θ , called HIVE, under homoscedastic errors. The first step of the algorithm estimates the best linear prediction of Y given X, in which the unknown coefficient matrix exhibits an additive decomposition of Ψ and a dense matrix due to the correlation between X and Z. Under the sparsity assumption on Ψ , we propose to minimize a penalized least squares loss by regularizing Ψ and the dense matrix via group-lasso and multivariate ridge, respectively. Nonasymptotic deviation bounds of the in-sample prediction error are established. Our second step estimates the row space of B by leveraging the covariance structure of the residual vector from the first step. In the last step, we estimate Θ via projecting Y onto the orthogonal complement of the estimated row space of B to remove the effect of hidden variables. Non-asymptotic error bounds of our final estimator of Θ , which are valid for any m, p,K and n, are established. We further show that, under mild assumptions, the rate of our estimator matches the best possible rate with known B and is adaptive to the unknown sparsity of Θ induced by the sparsity of Ψ . The model identifiability, estimation algorithm and statistical guarantees are further extended to the setting with heteroscedastic errors. Thorough numerical simulations and two real data examples are provided to back up our theoretical results.

    主讲人简介:

    Dr. Ning is an assistant professor in the Department of Statistics and Data Science at Cornell University. Prior to joining into the Cornell University, he was a post-doc at Princeton University. He received his Ph.D in Biostatistics from the Johns Hopkins University. His research interests focus on the high-dimensional statistics and causal inference with applications to biology, medicine and public health.

    电话:86-028-87352207                
    地址:四川省成都市青羊区光华村街55号                
    邮编:610074                
    西南财经大学 数据科学与商业智能联合DB视讯(中国) 版权所有                
    蜀ICP备05006386号