Maxton‘s Blog

Blog Docs Links About Timeline 中文

Back

Blog

Page 1 - Showing 8 of 15 posts View all posts by years →

Feb 22, 2026

RL Study Notes: Actor-Critic Algorithm

Overview of RL Actor-Critic frameworks, covering derivations and updates for QAC, A2C, importance sampling, and DPG.

6 min English
Feb 22, 2026

RL Study Notes: Policy Gradient Methods

Core concepts of RL policy gradient methods: objective functions, the log-derivative trick, theorem derivation, and the REINFORCE algorithm.

6 min English
Feb 21, 2026

RL Study Notes: Value Function Approximation

Summary of value function approximation in RL, covering linear/non-linear forms, state distributions, gradient methods, DQN, and experience replay.

5 min English
Feb 20, 2026

RL Study Notes: Temporal-Difference Learning

A summary of core Temporal-Difference learning concepts, comparing TD with MC, and detailing mechanisms for Sarsa, n-step Sarsa, and Q-learning.

12 min English
Feb 19, 2026

RL Study Notes: SA and SGD

A review of Stochastic Approximation and the Robbins-Monro algorithm, detailing the evolution, convergence properties, and sampling differences of SGD.

7 min English
Feb 18, 2026

RL Study Notes: Monte Carlo Methods

RL Monte Carlo methods: MC Basic, Exploring Starts, GPI, and epsilon-Greedy for model-free optimization.

3 min English
Feb 18, 2026

RL Study Notes: Value Iteration and Policy Iteration

Analyzes Value & Policy Iteration, showing how Truncated PI unifies them via evaluation steps.

3 min English
Feb 17, 2026

RL Study Notes: Bellman Optimality Equation

Derives Bellman Optimality and fixed-point properties. Analyzes Value Iteration (contraction mapping) and how models/rewards determine the optimal policy.

4 min English