cs234 / lecture 4 - model free control

Resources:

Control Objectives

On Policy vs Off Policy Learning

Model Free Policy Iteration

Policy Evaluation with Exploration

Greedy Limit of Infinite Exploration (GLIE)

Monte Carlo Online / On Policy Control

Model Free Policy Iteration with Temporal Difference Methods

Maximization Bias

Double Q-Learning