Reinforcement Learning
11 Modules ~36 hours Intermediate → Advanced
Master Reinforcement Learning from theory to practice: Markov Decision Processes, dynamic programming, Q-learning, policy gradients, actor-critic, and modern deep RL (DQN, PPO, SAC) with PyTorch and Gymnasium.
Course roadmap
| # | Module | Status | Topics |
|---|---|---|---|
| 0 | Setup & RL Vocabulary | Plan ready | Agent, environment, reward, state, action, policy, return, episode |
| 1 | Markov Decision Processes | Plan ready | MDPs, Bellman equations, value functions, policies |
| 2 | Dynamic Programming | Plan ready | Policy iteration, value iteration, model-based RL |
| 3 | Monte Carlo & Temporal Difference | Plan ready | MC prediction, TD(0), SARSA, Q-learning |
| 4 | Function Approximation | Plan ready | Linear FA, neural net FA, deadly triad |
| 5 | Deep Q-Networks | Plan ready | DQN, replay buffer, target net, Double DQN, Dueling DQN |
| 6 | Policy Gradient Methods | Plan ready | REINFORCE, baselines, actor-critic, A2C, A3C |
| 7 | Trust Region Methods | Plan ready | TRPO, PPO, GAE, clipping |
| 8 | Continuous Control | Plan ready | DDPG, TD3, SAC, exploration noise |
| 9 | Advanced Topics | Plan ready | Multi-agent RL, offline RL, model-based RL, RLHF for LLMs |
| 10 | Capstone | Plan ready | Train an agent on a Gymnasium env: MountainCar → LunarLander → custom env |
What's available now
Curriculum plan published. Content rolling out 2026 H2.
Related courses:
- Machine Learning — supervised learning prerequisites
- llm-development — RLHF and DPO for LLMs
Last updated
2026-05 — Curriculum plan published.