Admissions Open for JANUARY Batch
Learn math of trial-and-error learning through rewards and policies.
Days : Tue & Thu
Duration : 4 Hours
Timings: 8 - 10 PM IST
Try Risk-free, 15 Days Money Back Guarantee
4 Hours
9 - 10 PM IST
Tue & Thu
Maths for Reinforcement Learning
Learn math of trial-and-error learning through rewards and policies.
Online Live Instructor-Led Learning
4 Hours
8 - 10 PM IST
Tue & Thu
By end of this course
Get stronger in
Reward functions and policy gradients math
Q-learning and value iteration
Get familier with
Markov Decision Processes (MDPs)
Exploration vs exploitation concepts
New Batch Starts : jan 2026
Limited seats only 15 students per batch
Who Should Enroll?
This course is for learners interested in reinforcement learning, focusing on mathematical principles behind decision-making, rewards, Markov processes, and policy optimization for specialized AI applications.
Prerequisites
Deep learning math and probability knowledge.
Experience our course risk-free
We offer a 15-day money back guarantee
Prerequisite
Deep learning math and probability knowledge.
Who Should Enroll?
This course is for learners interested in reinforcement learning, focusing on mathematical principles behind decision-making, rewards, Markov processes, and policy optimization for specialized AI applications.
By end of this course
Get Stronger in
- Reward functions and policy gradients math
- Q-learning and value iteration
Get Familiar in
- Markov Decision Processes (MDPs)
- Exploration vs exploitation concepts
Course Contents
Topics
- Markov Decision Processes (MDPs): states, actions, rewards
- Transition probabilities and reward functions
- Discounting and return formulation
Key Outcomes
Formalize RL problems
mathematically
Topics
- Bellman expectation and optimality equations
- Policy and value iteration math
- Convergence properties and proofs
Key Outcomes
Solve for optimal policies and value functions
Topics
- TD learning derivations
- Q-learning update rule derivation
- SARSA math
- Gradient-based policy optimization
Key Outcomes
Derive and implement RL algorithms
Topics
- REINFORCE algorithm math
- Baseline subtraction derivation
- Actor-Critic formulation
- Advantage estimation math (GAE)
Key Outcomes
Understand and optimize actor-critic methods
Topics
- Markov Decision Processes (MDPs): states, actions, rewards
- Transition probabilities and reward functions
- Discounting and return formulation
Key Outcomes
Formalize RL problems
mathematically
Topics
- Bellman expectation and optimality equations
- Policy and value iteration math
- Convergence properties and proofs
Key Outcomes
Solve for optimal policies and value functions
Topics
- TD learning derivations
- Q-learning update rule derivation
- SARSA math
- Gradient-based policy optimization
Key Outcomes
Derive and implement RL algorithms
Topics
- REINFORCE algorithm math
- Baseline subtraction derivation
- Actor-Critic formulation
- Advantage estimation math (GAE)
Key Outcomes
Understand and optimize actor-critic methods