Action-dependent Control Variates for Policy Optimization via Stein’s Identity

· research