
Fellows Talk
Lin Chen (UC Berkeley)
Zoom
Speaker: Lin Chen (UC Berkeley)
Title: Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm
Abstract: In this paper, we investigate the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning (also known as the off-policy evaluation problem) with linear function approximation. We identify a hard regime , where
is the dimension of the feature vector and
is the discount rate. In this regime, for any
, we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is
and it requires
samples to approximate the value function up to an additive error
. Note that the lower bound of the sample complexity is exponential in
. If
, even infinite data cannot suffice. Under the low distribution shift assumption, we show that there is an algorithm that needs at most
samples (
is the parameter of the policy in linear function approximation) and guarantees approximation to the value function up to an additive error of
with probability at least
.