Interesting research papers I have read (and my notes):
- Universal Value Function Approximators
- Progressive Neural Networks
- Variational Option Discovery Algorithms
- Diversity is All You Need: Learning Skills without a Reward Function
- Variational Intrinsic Control
- Exploration by Random Network Distillation
- Curiosity-driven Exploration by Self-supervised Prediction
- EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
- #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
- Count-Based Exploration with Neural Density Models
- Recurrent World Models Facilitate Policy Evolution
- Unifying Count-Based Exploration and Intrinsic Motivation
- VIME: Variational Information Maximizing Exploration
- Evolution Strategies as a Scalable Alternative to Reinforcement Learning
- Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
- The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
- Combining Policy Gradient and Q-learning
- Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
- Decision Transformer: Reinforcement Learning via Sequence Modeling
- Bridging the Gap Between Value and Policy Based Reinforcement Learning
- Action-dependent Control Variates for Policy Optimization via Stein’s Identity
- Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
- Implicit Quantile Networks for Distributional Reinforcement Learning
- Distributional Reinforcement Learning with Quantile Regression
- A Distributional Perspective on Reinforcement Learning
- Addressing Function Approximation Error in Actor-Critic Methods
- Continuous Control with Deep Reinforcement Learning
- Deterministic Policy Gradient Algorithms
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
- Sample Efficient Actor-Critic with Experience Replay
- Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
- Proximal Policy Optimization Algorithms
- Emergence of Locomotion Behaviours in Rich Environments
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Trust Region Policy Optimization
- Asynchronous Methods for Deep Reinforcement Learning
- Rainbow - Combining Improvements in Deep Reinforcement Learning
- Prioritized Experience Replay
- Deep Reinforcement Learning with Double Q-learning
- Dueling Network Architectures for Deep Reinforcement Learning
- Deep Recurrent Q-Learning for Partially Observable MDPs
- Playing Atari With Deep Reinforcement Learning
- Extensibility, Safety, and Performance in the SPIN Operating System
- On Micro-Kernel Construction
- Exokernel - An Operating System Architecture for Application-Level Resource Management