Back
Core concepts of RL policy gradient methods: objective functions, the log-derivative trick, theorem derivation, and the REINFORCE algorithm.
reinforcement learning
policy gradient
reinforce
study notes