强化学习

  1. Trust Region Policy Optimization. (TRPO)
  2. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. (ACKTR)
  3. Proximal Policy Optimization Algorithms. (PPO)
  4. Deterministic Policy Gradient Algorithms. (DPG)
  5. Continuous Control with Deep Reinforcement Learning. (DDPG)
  6. Addressing Function Approximation Error in Actor-Critic Methods. (TD3)
  7. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. (SAC)