View on GitHub

rlgrad

Link back to the Syllabus

All readings are from the textbook. These readings are designed to be short, so that it should be easy to keep up with the readings.

This schedule is tentative, and is likely to change throughout the semester.

The MOOC consists of three courses, with 4 modules in each course. We will be completing the MOOC at an accelerated pace, so we have time to focus on projects. The schedule below is based on completing 2 modules per week. The videos are short, and each week you will watch around 1 hour of videos for the two modules.

Course 1 has four modules: (1) K-Armed Bandit, (2) MDPs (3) Value Functions and Bellman Equations and (4) Dynamic Programming.

Course 2 has four modules: (1) Monte-Carlo for Prediction and Control, (2) TD for Prediction (3) TD for Control and (4) Planning, Learning and Acting

Course 3 has four modules: (1) On-policy Prediction with Approximation (2) Constructing Features (3) Control with Approximation (4) Policy Gradient

Course 4 allows you to put together a full RL agent. This mini-project is optional.

The Google Form for discussion question is here:

Week	Date	Topic	Deadlines
1	September 2	Introduction to the Course and discussion of projects Modules 1 and 2 from Course 1 will be due by 11:59 next Thursday (September 10)	Link to Course 1, complete the first two modules: (1) K-Armed Bandit, (2) MDPs Submit a discussion question for C1M1 and C1M2 by midnight on Sunday (September 6), using slido. You’ll need to enter the code corresponding to the week/topic. Event codes can be found on eclass.
2	September 7	Holiday, No Classes
2	September 9	Lecture about (requested) background, from Andrew Patterson and Shivam Garg Background topics include probability and linear algebra In-class Discussion about C1M1 and C1M2	Course 1, Modules 1 and 2 due by end of day Thursday (September 10) For next Thursday, complete Modules 3 and 4 in Course 1 (3) Value Functions and Bellman Equations and (4) Dynamic Programming Submit a discussion question for C1M3 and C1M4 by midnight on Sunday (September 13), using slido. You’ll need to enter the code corresponding to the week/topic. Event codes can be found on eclass.
3	September 14	More background, from Andy and Shivam In-class Discussion about C1M3 and C1M4	Sept. 14 last day to drop courses without fees
3	September 16	Review of Course 1 and Project discussion	C1M3 and C1M4 due on Thursday (September 17) at 11:59 pm Start Course 2, (1) Monte-Carlo for Prediction and Control, (2) TD for Prediction Submit a discussion question for C2M1 and C2M2 by midnight on Sunday (September 20), using slido
4	September 21	In-class Discussion about C2M1 and C2M2
4	September 23	Discussed convergence of iterative policy evaluation, compared MC and TD	C2M1 and C2M2 due on Thursday (September 24) at 11:59 pm Start Course 2, (3) TD for Control and (4) Planning, Learning and Acting Submit a discussion question for C2M3 and C2M4 by midnight on Sunday (September 27), using slido
5	September 28	Discussed and pitched possible projects	October 2 last day to drop course (50% fees)
5	September 30	Discussion with Undergrads in RL (Mentoring)	C2M3 and C2M4 due on Thursday (October 1) at 11:59 pm Start Course 3 (1) On-policy Prediction with Approximation (2) Constructing Features Submit a discussion question for C3M1 and C3M2 by midnight on Sunday (October 4), using slido
6	October 5	In-class Discussion about C2M3 and C2M4, lecture about Double Q-learning and off-policy TD
6	October 7	In-class Discussion about C3M1 and C3M2, discuss objectives for RL and how to run good experiments	C3M1 and C3M2 due on Thursday (October 8) at 11:59 pm Start Course 3 (3) Control with Approximation (4) Policy Gradient Submit a discussion question for C3M3 and C3M4 by midnight on Sunday (October 11), using slido
7	October 12	Holiday (Thanksgiving)	Project proposal due today at 11:59 pm
7	October 14	Midterm review lecture: Slides	C3M3 and C3M4 due on Thursday (October 15) at 11:59 pm
8	October 19	Review Lecture
8	October 21	Midterm	The midterm is effectively a final for all of the MOOC material
9	October 26	Discussion about experimental design
9	October 28	Stand-up for projects Short lecture about partial observability
10	November 2	Stand-up for projects Short presentation about writing
10	November 4	Discussion with Undergrads in RL (Mentoring)
11	November 9, 11	No classes: Reading week
12	November 16	Stand-up for projects, Supplementary Lecture	First draft of Project Due on November 18 at 11:59 pm, that outlines the problem clearly, has a relatively complete literature survey and concrete plan for experiments/theory. The full description of requirements is on eClass.
12	November 18	Lecture about some common issues in projects More discussion about papers
13	November 23	Stand-up for projects. Discuss the Conservative Policy Iteration paper.
13	November 25	Open discussion about projects Discuss the paper: Proximal Policy Optimization Algorithms.	November 30 is the last day to withdraw from courses.
14	November 30	Stand-up for projects. Discuss the paper: Non-delusional Q-learning and value-iteration.
14	December 2	Stand-up for projects. Discuss the QUOTA paper .
15	December 7	Office hours, to discuss projects	Final projects due by Friday (December 11) 11:59 pm