- study notes10
- reinforcement learning9
- xss4
- notes4
- web4
- value iteration2
- bellman equation1
- math1
- bellman optimality1
- policy iteration1
- truncated policy iteration1
- mdp1
- math basics1
- monte carlo methods1
- gpi1
- epsilon-greedy1
- stochastic approximation1
- sgd1
- robbins-monro1
- optimization1
- value function approximation1
- dqn1
- td learning1
- sarsa1
- q-learning1
- actor-critic1
- a2c1
- dpg1
- policy gradient1
- reinforce1
- astro1
- cloudflare1
- pitfalls1
- tinkering1