【专家简介】:王亚飞,阿尔伯塔大学数学与统计科学系助理教授,2019年于北京工业大学获得统计学博士学位。曾在加拿大阿尔伯塔大学从事博士后研究(2019–2022),并于英国埃塞克斯大学任讲师(2022–2023)。研究方向包括函数型数据分析、稳健统计、分布稳健优化、去中心化学习与强化学习。现已在Bernoulli、NeurIPS、AAAI、ICML等期刊与会议上发表论文十余篇。
【报告摘要】:Recent developments in reinforcement learning have significantly improved sequential decision-making performance in uncertain environments. Despite its favorable performance guarantees, existing work has concentrated its efforts primarily on characterizing their regret and convergence rates, with less attention given to their asymptotic behavior and the inference procedure. However, these latter aspects are important for quantifying inherent uncertainty and variability in practical applications. In this work, we studied statistical inference for the policy gradient method for the noisy Linear Quadratic Reinforcement Learning over a finite time horizon, where linear dynamics with known and unknown drift parameters are controlled subject to quadratic cost. In particular, we studied the theoretical foundations of statistical inference and established exact asymptotics for the policy gradient estimators. We proposed a principled inference procedure using online bootstrapping techniques to construct a confidence interval for the obtained optimal policy. Numerical experiments demonstrated the efficacy of the proposed method for noisy linear dynamic systems under various settings.
【报告时间】:2025年07月09日10:40-11:20
【报告地点】:崇真楼110