View on GitHub

rlgrad

Link back to the Syllabus

All readings are from the textbook. These readings are designed to be short, so that it should be easy to keep up with the readings.

This schedule is tentative, and is likely to change throughout the semester.

The MOOC consists of three courses, with 4 modules in each course. We will be completing the MOOC at an accelerated pace, so we have time to focus on projects. The schedule below is based on completing 2 modules per week. The videos are short, and each week you will watch around 1 hour of videos for the two modules.

Course 1 has four modules: (1) K-Armed Bandit, (2) MDPs (3) Value Functions and Bellman Equations and (4) Dynamic Programming.

Course 2 has four modules: (1) Monte-Carlo for Prediction and Control, (2) TD for Prediction (3) TD for Control and (4) Planning, Learning and Acting

Course 3 has four modules: (1) On-policy Prediction with Approximation (2) Constructing Features (3) Control with Approximation (4) Policy Gradient

Course 4 allows you to put together a full RL agent. This mini-project is optional.

The Google Form for discussion question is here:

Week Date Topic Deadlines
1 September 2 Introduction to the Course and discussion of projects

Modules 1 and 2 from Course 1 will be due by 11:59 next Thursday (September 10)
Link to Course 1, complete the first two modules: (1) K-Armed Bandit, (2) MDPs

Submit a discussion question for C1M1 and C1M2 by midnight on Sunday (September 6), using slido. You’ll need to enter the code corresponding to the week/topic. Event codes can be found on eclass.
2 September 7 Holiday, No Classes  
2 September 9 Lecture about (requested) background, from Andrew Patterson and Shivam Garg
Background topics include probability and linear algebra
In-class Discussion about C1M1 and C1M2
Course 1, Modules 1 and 2 due by end of day Thursday (September 10)

For next Thursday, complete Modules 3 and 4 in Course 1 (3) Value Functions and Bellman Equations and (4) Dynamic Programming

Submit a discussion question for C1M3 and C1M4 by midnight on Sunday (September 13), using slido. You’ll need to enter the code corresponding to the week/topic. Event codes can be found on eclass.
3 September 14 More background, from Andy and Shivam
In-class Discussion about C1M3 and C1M4
Sept. 14 last day to drop courses without fees
3 September 16 Review of Course 1 and Project discussion C1M3 and C1M4 due on Thursday (September 17) at 11:59 pm

Start Course 2, (1) Monte-Carlo for Prediction and Control, (2) TD for Prediction

Submit a discussion question for C2M1 and C2M2 by midnight on Sunday (September 20), using slido
4 September 21 In-class Discussion about C2M1 and C2M2  
4 September 23 Discussed convergence of iterative policy evaluation, compared MC and TD C2M1 and C2M2 due on Thursday (September 24) at 11:59 pm

Start Course 2, (3) TD for Control and (4) Planning, Learning and Acting

Submit a discussion question for C2M3 and C2M4 by midnight on Sunday (September 27), using slido
5 September 28 Discussed and pitched possible projects October 2 last day to drop course (50% fees)
5 September 30 Discussion with Undergrads in RL (Mentoring) C2M3 and C2M4 due on Thursday (October 1) at 11:59 pm

Start Course 3 (1) On-policy Prediction with Approximation (2) Constructing Features

Submit a discussion question for C3M1 and C3M2 by midnight on Sunday (October 4), using slido
6 October 5 In-class Discussion about C2M3 and C2M4, lecture about Double Q-learning and off-policy TD  
6 October 7 In-class Discussion about C3M1 and C3M2, discuss objectives for RL and how to run good experiments C3M1 and C3M2 due on Thursday (October 8) at 11:59 pm

Start Course 3 (3) Control with Approximation (4) Policy Gradient

Submit a discussion question for C3M3 and C3M4 by midnight on Sunday (October 11), using slido
7 October 12 Holiday (Thanksgiving) Project proposal due today at 11:59 pm
7 October 14 Midterm review lecture: Slides C3M3 and C3M4 due on Thursday (October 15) at 11:59 pm
8 October 19 Review Lecture  
8 October 21 Midterm The midterm is effectively a final for all of the MOOC material
9 October 26 Discussion about experimental design  
9 October 28 Stand-up for projects
Short lecture about partial observability
 
10 November 2 Stand-up for projects
Short presentation about writing
 
10 November 4 Discussion with Undergrads in RL (Mentoring)  
11 November 9, 11 No classes: Reading week  
12 November 16 Stand-up for projects, Supplementary Lecture First draft of Project Due on November 18 at 11:59 pm, that outlines the problem clearly, has a relatively complete literature survey and concrete plan for experiments/theory. The full description of requirements is on eClass.
12 November 18 Lecture about some common issues in projects
More discussion about papers
 
13 November 23 Stand-up for projects.
Discuss the Conservative Policy Iteration paper.
 
13 November 25 Open discussion about projects
Discuss the paper: Proximal Policy Optimization Algorithms.
November 30 is the last day to withdraw from courses.
14 November 30 Stand-up for projects.
Discuss the paper: Non-delusional Q-learning and value-iteration.
 
14 December 2 Stand-up for projects.
Discuss the QUOTA paper .
 
15 December 7 Office hours, to discuss projects Final projects due by Friday (December 11) 11:59 pm