cs234 / lecture 2 - given a model of the world

Resources:

Markov Process and Markov Chains

Markov Reward Process

Return and Value Function

Discount Factor

Computing Value of a Markov Reward Process

$$ \begin{pmatrix} V(s_1)
\\ \vdots \\ V(s_N) \end{pmatrix} = \begin{pmatrix} R(s_1)
\\ \vdots \\ R(s_N) \end{pmatrix} + \gamma \begin{pmatrix} P(s_1 \vert s_1) & P(s_1 \vert s_2) & \dots & P(s_1 \vert s_N)
\\ \vdots & \vdots & \ddots & \vdots \\ P(s_N \vert s_1) & P(s_N \vert s_2) & \dots & P(s_N \vert s_N) \end{pmatrix} \begin{pmatrix} V(s_1)
\\ \vdots \\ V(s_N) \end{pmatrix} $$

Markov Decision Processes

Policies

Policy Evaluation

MDP Control

MDP Policy Iteration

Value Iteration