cs234 / lecture 1 - introduction to reinforcement learning



Goal: Use data / experience to make the best sequence of good decisions under certainty Credit Assignment Problem: The causal relationship between actions and future rewards

  Optimization Exploration Generalization Delayed Consequences
Reinforcement Learning
Supervised Machine Learning
Unsupervised Machine Learning
Imitation Learning

Imitation Learning

Learning to do something by observing another agent do that task.



Sequential Decision Making

Goal: Maximize total expected future reward

History: $h_t = (a_1, o_1, r_1…a_t, o_t, r_t)$

World State: The true state of the world generates next state + reward. This is usually unknown to the agent

Markov Assumption: To predict the future, you only need to know the current state (future independent of past given the present) $$p(s _{t+1}| s_t, a_t) = p(s _{t+1}| h_t, a_t) $$

Setting the state as the history will always make the problem markov (but that is a lot information $\rightarrow$ using most recent observation for state is generally enough)


Types of Sequential Decision Processes

RL Algorithm Components

Types of RL Agents

Challenges in RL

Exploration and Exploitation

Evaluation and Control