| 9 Improved Temporal Difference Methods with Linear Function Approximation (2009) | |||||||||||||||
Abstract | |||||||||||||||
| Editor’s Summary: This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 12 and 6. The advantage of the method presented here is that this is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Suttonfls TD(λ) and with various versions of least-squares that are based on value-iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan and Bradtke and Barto. In this paper, we analyze methods for approximate evaluation of the cost-to-go function of a stationary Markov chain within the framework of infinite-horizon discounted dynamic programming. We denote the states by 1,..., n, the transition probabilities | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||