- RL Study Notes: Actor-Critic Algorithm
Overview of RL Actor-Critic frameworks, covering derivations and updates for QAC, A2C, importance sampling, and DPG.
6 min English - RL Study Notes: Policy Gradient Methods
Core concepts of RL policy gradient methods: objective functions, the log-derivative trick, theorem derivation, and the REINFORCE algorithm.
6 min English - RL Study Notes: Value Function Approximation
Summary of value function approximation in RL, covering linear/non-linear forms, state distributions, gradient methods, DQN, and experience replay.
5 min English - RL Study Notes: Temporal-Difference Learning
A summary of core Temporal-Difference learning concepts, comparing TD with MC, and detailing mechanisms for Sarsa, n-step Sarsa, and Q-learning.
12 min English - RL Study Notes: SA and SGD
A review of Stochastic Approximation and the Robbins-Monro algorithm, detailing the evolution, convergence properties, and sampling differences of SGD.
7 min English - RL Study Notes: Monte Carlo Methods
RL Monte Carlo methods: MC Basic, Exploring Starts, GPI, and epsilon-Greedy for model-free optimization.
3 min English - RL Study Notes: Value Iteration and Policy Iteration
Analyzes Value & Policy Iteration, showing how Truncated PI unifies them via evaluation steps.
3 min English - RL Study Notes: Bellman Optimality Equation
Derives Bellman Optimality and fixed-point properties. Analyzes Value Iteration (contraction mapping) and how models/rewards determine the optimal policy.
4 min English
Blog
Page 1 - Showing 8 of 15 posts
View all posts by years →