Regularized Centered Emphatic Temporal Difference Learning
arXiv cs.AIMay 7, 2026
temporal-differencemachine-learningvariance-controlreinforcement-learning
The paper introduces Regularized Emphatic Temporal-Difference Learning (RETD), a novel approach to off-policy temporal-difference learning that addresses the tradeoff between stability, projection geometry, and variance control. By revisiting the concept of Bellman-error centering, the authors demonstrate how naive implementations can undermine the positive-definiteness of the key matrix in Emphatic TD. RETD aims to maintain the benefits of follow-on emphasis while mitigating the issues associated with high variance in the auxiliary centering recursion.