This tutorial reviews techniques for planning and learning in Markov Decision Processes (MDPs) with linear function approximation of the value function. Two major paradigms for finding optimal policies were considered: dynamic programming (DP) techniques for planning and reinforcement learning (RL).
This tutorial reviews techniques for planning and learning in Markov Decision Processes (MDPs) with linear function approximation of the value function. Two major paradigms for finding optimal policies were considered: dynamic programming (DP) techniques for planning and reinforcement learning (RL).