Admissions Open for JANUARY Batch

UPMENTA COURSES IMAGES (13)

Learn math of trial-and-error learning through rewards and policies.

Days : Tue & Thu

Duration : 4 Hours

Timings: 8 - 10 PM IST

Try Risk-free, 15 Days Money Back Guarantee

4 Hours

9 - 10 PM IST

Tue & Thu

Maths for Reinforcement Learning

Learn math of trial-and-error learning through rewards and policies.

Online Live Instructor-Led Learning

4 Hours

8 - 10 PM IST

Tue & Thu

By end of this course

Get stronger in

Reward functions and policy gradients math

Q-learning and value iteration

Get familier with

Markov Decision Processes (MDPs)

Exploration vs exploitation concepts

New Batch Starts : jan 2026

Limited seats only 15 students per batch

Who Should Enroll?

This course is for learners interested in reinforcement learning, focusing on mathematical principles behind decision-making, rewards, Markov processes, and policy optimization for specialized AI applications.

Prerequisites

Deep learning math and probability knowledge.

Experience our course risk-free

We offer a 15-day money back guarantee

Prerequisite

Deep learning math and probability knowledge.

Who Should Enroll?

This course is for learners interested in reinforcement learning, focusing on mathematical principles behind decision-making, rewards, Markov processes, and policy optimization for specialized AI applications.

By end of this course

Get Stronger in

  • Reward functions and policy gradients math
  • Q-learning and value iteration

Get Familiar in

  • Markov Decision Processes (MDPs)
  • Exploration vs exploitation concepts

Course Contents

Day 1 - Foundations of RL Math

Topics

  • Markov Decision Processes (MDPs): states, actions, rewards
  • Transition probabilities and reward functions
  • Discounting and return formulation

Key Outcomes

Formalize RL problems
mathematically

Day 2 - Bellman Equations & Dynamic Programming

Topics

  • Bellman expectation and optimality equations
  • Policy and value iteration math
  • Convergence properties and proofs

Key Outcomes

Solve for optimal policies and value functions

Day 3 - Temporal Difference & Policy Optimization

Topics

  • TD learning derivations
  • Q-learning update rule derivation
  • SARSA math
  • Gradient-based policy optimization

Key Outcomes

Derive and implement RL algorithms

Day 4 - Advanced Policy Gradients

Topics

  • REINFORCE algorithm math
  • Baseline subtraction derivation
  • Actor-Critic formulation
  • Advantage estimation math (GAE)

Key Outcomes

Understand and optimize actor-critic methods

Day 1 - Foundations of RL Math

Topics

  • Markov Decision Processes (MDPs): states, actions, rewards
  • Transition probabilities and reward functions
  • Discounting and return formulation

Key Outcomes

Formalize RL problems
mathematically

Day 2 - Bellman Equations & Dynamic Programming

Topics

  • Bellman expectation and optimality equations
  • Policy and value iteration math
  • Convergence properties and proofs

Key Outcomes

Solve for optimal policies and value functions

Day 3 - Temporal Difference & Policy Optimization

Topics

  • TD learning derivations
  • Q-learning update rule derivation
  • SARSA math
  • Gradient-based policy optimization

Key Outcomes

Derive and implement RL algorithms

Day 4 - Advanced Policy Gradients

Topics

  • REINFORCE algorithm math
  • Baseline subtraction derivation
  • Actor-Critic formulation
  • Advantage estimation math (GAE)

Key Outcomes

Understand and optimize actor-critic methods