【专家简介】:孔德含,多伦多大学统计学副教授,研究方向包括脑图像,统计遗传和基因组学,函数型数据分析,因果推断,高维数据分析以及机器学习。研究成果发表在统计学国际顶级期刊JRSSB,JASA,Biometrika等,现任统计学期刊JASA副主编。
【报告摘要】:Recent developments in reinforcement learning have significantly improved sequential decision-making performance in uncertain environments. Despite its favorable performance guarantees, existing work has concentrated its efforts primarily on characterizing their regret and convergence rates, with less attention given to their asymptotic behavior and the inference procedure. However, these latter aspects are important for quantifying inherent uncertainty and variability in practical applications. In this work, we studied statistical inference for the policy gradient method for the noisy Linear Quadratic Reinforcement Learning over a finite time horizon, where linear dynamics with known and unknown drift parameters are controlled subject to quadratic cost. In particular, we studied the theoretical foundations of statistical inference and established exact asymptotics for the policy gradient estimators. We proposed a principled inference procedure using online bootstrapping techniques to construct a confidence interval for the obtained optimal policy. Numerical experiments demonstrated the efficacy of the proposed method for noisy linear dynamic systems under various settings.
【报告时间】:2025年07月21日10:30-11:30
【报告地点】:崇真楼110