cs234 / lecture 3 - model free policy evaluation

Resources:

Dynamic Programming Policy Evaluation

Monte Carlo Policy Evaluation

Bias, Variance, and MSE

Back to First Visit Monte Carlo:

Back to Every Visit Monte Carlo:

Temporal Difference Learning

Dynamic Programming vs Monte Carlo vs Temporal Difference

  Dynamic Programming Monte Carlo Temporal Difference
Usable without a model of the domain
Usable with non-episodic domains  
Handles Non-Markovian domains    
Converges to true value in limit
Unbiased estimate of value   ✅ (first visit)  

Properties to Evaluate Algorithms: