题目:A Two-Stage Optimal Subsampling Estimation for Missing Data Problems with Large-Scale Data
主讲人:中国科研DB视讯(中国)数学与系统科学DB视讯(中国) 王启华研究员
主持人:统计学院 常晋源教授
时间:2022年4月15日(周五)下午15:00-16:00
地点:腾讯会议,436 882 467
报告摘要:
Subsampling is useful to downsize data volumes and speed up calculations for large-scale data and is well studied with completely observed data. In the presence of missing data, computation is more challenging and subsampling becomes more crucial and complex. However, there is still a lack of study on subsampling for missing data problems. This paper fills the gap by studying subsampling method for a widely used missing data estimator, the augmented inverse probability weighting (AIPW) estimator. The response mean estimation problem with missing responses is discussed for illustration. A two-stage subsampling method is proposed via Poisson sampling framework. A small subsample of expected size $n_{1}$ is used in the first stage to estimate the parameters in the propensity score and the outcome regression models, while a larger subsample of expected size $n_{2}$ is used in the computationally simple second stage to calculate the final estimator. An attractive property of the resulting estimator is that its convergence rate is $n_{2}^{-1/2}$ rather than $n_{1}^{-1/2}$ when both the propensity score and the outcome regression functions are correctly specified. The rate $n_{2}^{-1/2}$ is still attainable for some important cases if only one of the two functions is correctly specified. This indicates that using a small subsample in the computationally complex first stage can reduce computational burden with little impact on the statistical accuracy. Asymptotic normality of the resulting estimator is established and the optimal subsampling probability is derived by minimizing the asymptotic variance of the resulting estimator. Simulations and a real data analysis were conducted to demonstrate the empirical performance of the resulting estimator.
主讲人简介:
王启华,中国科研DB视讯(中国)数学与系统科学DB视讯(中国)研究员,博士生导师,国家杰出青年基金取得者,国家级人才项目入选者,中科院“百人计划”入选者。曾在北京大学与香港大学任教,先后访问加拿大、美国、德国及澳大利亚10多所世界一流大学。主要从事缺失数据分析、高维数据统计分析、复杂数据经验似然统计推断、大规模数据及非-半参数统计推断等方面的研究。特别是在不完全数据经验似然方面的工作已产生广泛的学术影响,至今已引发长达20年陆续在不断的一系列相关研究。出版专著三部,在The Annals of Statistics,JASA及Biometrika等国际重要刊物发表论文140余篇。曾主持国家自然科学基金重点项目,作为核心骨干成员先后参加了两项国家自然科学基金创新群体项目。是高维统计分会理事长,生存分析分会副理事长,中国现场统计研究会常务理事,是一些国际与国内学术期刊的编委。