View on GitHub

rlgrad

Link back to the Syllabus

All readings are from the textbook. These readings are designed to be short, so that it should be easy to keep up with the readings.

This schedule is tentative, and is likely to change throughout the semester.

The MOOC consists of three courses, with 5 modules in each course. We will be completing the MOOC at an accelerated pace, so we have time to focus on projects. The schedule below is based on completing 2 modules per week. The videos are short, and each week you will watch around 1 hour of videos for the two modules. The first module in each course is the introduction, followed by 4 content modules.

Course 1 has four content modules: (2) K-Armed Bandit, (3) MDPs (4) Value Functions and Bellman Equations and (5) Dynamic Programming.

Course 2 has four content modules: (2) Monte-Carlo for Prediction and Control, (3) TD for Prediction (4) TD for Control and (5) Planning, Learning and Acting

Course 3 has four content modules: (2) On-policy Prediction with Approximation (3) Constructing Features (4) Control with Approximation (5) Policy Gradient

Course 4 allows you to put together a full RL agent. This mini-project is optional.

Week Date Topic Deadlines
1 September 2 Introduction to the Course and Q&A

Modules 2 and 3 from Course 1 will be due by 11:59 next Thursday (September 11)
Link to Course 1, complete the first two modules after the introduction: (2) K-Armed Bandit, (3) MDPs

1 September 4 Probability Review (by TAs) and MOOC co-working/help session  
2 September 9 In-class Discussion about C1M2 and C1M3, Project discussion  
2 September 11 Other background requests, short Q&A about C1M2 and C1M3, Project discussions Course 1, Modules 2 and 3 due (Extended to Sunday, Sept14, 11:59PM)

For next Thursday, complete Modules 4 and 5 in Course 1 (4) Value Functions and Bellman Equations and (5) Dynamic Programming

3 September 16 Code-base overview and using Compute Canada, Q&A on C1M4 and C1M5  
3 September 18 Review of Course 1, In-class Discussion about C1M4 and C1M5 C1M4 and C1M5 due

Start Course 2, (2) Monte-Carlo for Prediction and Control, (3) TD for Prediction

4 September 23 Adam Lecture: Empirical RL, Q&A on C2M2 and C2M3  
4 September 25 Lecture topic: Exploration basics (epsilon-greedy, softmax, ensembles, optimism and Thompson sampling), In-class Discussion about C2M2 and C2M3, Project Discussions C2M2 and C2M3 due

Start Course 2, (4) TD for Control and (5) Planning, Learning and Acting
5 September 30 No classes (National Day for Truth and Reconciliation)  
5 October 2 Lecture topic: Off-policy TD and gradient methods, Q&A and In-class Discussion about C2M4 and C2M5 C2M4 and C2M5 due

Start Course 3 (2) On-policy Prediction with Approximation (3) Constructing Features

6 October 7 Lecture topic: IQL for batch RL, variants of DQN with target networks, In-class Discussion about C2M4 and C2M5  
6 October 9 Lecture topic: Soft Actor-Critic and Greedy Actor-Critic (and extensions of IQL), In-class Discussion about C3M2 and C3M3 C3M2 and C3M3 due

Start Course 3 (4) Control with Approximation (5) Policy Gradient
7 October 14 Lecture topic: PPO and lambda-returns, Q&A for C3M4 and C3M5 Finalized project and team due October 14 at 11:59 pm
7 October 16 Start Midterm review lecture: Slides C3M4 and C3M5 due
8 October 21 Midterm review lecture: Slides  
8 October 23 Midterm The midterm is effectively a final for all of the MOOC material
9 October 28 Miscellaneous lecture topics, Project stand-up + Q&A (groups 1-10) Read empirical RL paper for November 2
9 October 30 Presentation about writing and reviewing, Project stand-up + Q&A (groups 11-20)  
10 November 4 Discuss empirical RL paper  
10 November 6 Project stand-up + Q&A (all groups)  
11 November 11, 13 No classes: Reading week  
12 November 18 Project presentations (8 minutes max per project) First draft of Project Due on November 19 at 11:59 pm, that outlines the problem clearly, has a relatively complete literature survey and concrete plan for experiments/theory. The full description of requirements on Canvas.
12 November 20 Project presentations (8 minutes max per project)  
13 November 25 In-class time to review project drafts, discuss feedback in-person, with guidance from TAs Martha away for a CIFAR event (Learning, Machine and Brains)
13 November 27 Project stand-up + Q&A (all groups) Peer Reviews due Thursday, November 27 at 11:59 pm
14 December 2 Office hours to discuss projects, peers can help each other  
14 December 4 Office hours to discuss projects, peers can help each other Final projects due by Monday (December 8) 11:59 pm