强化学习
- Trust Region Policy Optimization. (TRPO)
- Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. (ACKTR)
- Proximal Policy Optimization Algorithms. (PPO)
- Deterministic Policy Gradient Algorithms. (DPG)
- Continuous Control with Deep Reinforcement Learning. (DDPG)
- Addressing Function Approximation Error in Actor-Critic Methods. (TD3)
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. (SAC)
