Link back to the Syllabus
All readings are from the textbook. These readings are designed to be short, so that it should be easy to keep up with the readings.
This schedule is tentative, and is likely to change throughout the semester.
The MOOC consists of three courses, with 4 modules in each course. We will be completing the MOOC at an accelerated pace, so we have time to focus on projects. The schedule below is based on completing 2 modules per week. The videos are short, and each week you will watch around 1 hour of videos for the two modules.
Course 1 has four modules: (1) K-Armed Bandit, (2) MDPs (3) Value Functions and Bellman Equations and (4) Dynamic Programming.
Course 2 has four modules: (1) Monte-Carlo for Prediction and Control, (2) TD for Prediction (3) TD for Control and (4) Planning, Learning and Acting
Course 3 has four modules: (1) On-policy Prediction with Approximation (2) Constructing Features (3) Control with Approximation (4) Policy Gradient
Course 4 allows you to put together a full RL agent. This mini-project is optional.
The Google Form for discussion question is here:
Week | Date | Topic | Deadlines |
---|---|---|---|
1 | September 2 | Introduction to the Course and discussion of projects Modules 1 and 2 from Course 1 will be due by 11:59 next Thursday (September 10) |
Link to Course 1, complete the first two modules: (1) K-Armed Bandit, (2) MDPs Submit a discussion question for C1M1 and C1M2 by midnight on Sunday (September 6), using slido. You’ll need to enter the code corresponding to the week/topic. Event codes can be found on eclass. |
2 | September 7 | Holiday, No Classes | |
2 | September 9 | Lecture about (requested) background, from Andrew Patterson and Shivam Garg Background topics include probability and linear algebra In-class Discussion about C1M1 and C1M2 |
Course 1, Modules 1 and 2 due by end of day Thursday (September 10) For next Thursday, complete Modules 3 and 4 in Course 1 (3) Value Functions and Bellman Equations and (4) Dynamic Programming Submit a discussion question for C1M3 and C1M4 by midnight on Sunday (September 13), using slido. You’ll need to enter the code corresponding to the week/topic. Event codes can be found on eclass. |
3 | September 14 | More background, from Andy and Shivam In-class Discussion about C1M3 and C1M4 |
Sept. 14 last day to drop courses without fees |
3 | September 16 | Review of Course 1 and Project discussion | C1M3 and C1M4 due on Thursday (September 17) at 11:59 pm Start Course 2, (1) Monte-Carlo for Prediction and Control, (2) TD for Prediction Submit a discussion question for C2M1 and C2M2 by midnight on Sunday (September 20), using slido |
4 | September 21 | In-class Discussion about C2M1 and C2M2 | |
4 | September 23 | Discussed convergence of iterative policy evaluation, compared MC and TD | C2M1 and C2M2 due on Thursday (September 24) at 11:59 pm Start Course 2, (3) TD for Control and (4) Planning, Learning and Acting Submit a discussion question for C2M3 and C2M4 by midnight on Sunday (September 27), using slido |
5 | September 28 | Discussed and pitched possible projects | October 2 last day to drop course (50% fees) |
5 | September 30 | Discussion with Undergrads in RL (Mentoring) | C2M3 and C2M4 due on Thursday (October 1) at 11:59 pm Start Course 3 (1) On-policy Prediction with Approximation (2) Constructing Features Submit a discussion question for C3M1 and C3M2 by midnight on Sunday (October 4), using slido |
6 | October 5 | In-class Discussion about C2M3 and C2M4, lecture about Double Q-learning and off-policy TD | |
6 | October 7 | In-class Discussion about C3M1 and C3M2, discuss objectives for RL and how to run good experiments | C3M1 and C3M2 due on Thursday (October 8) at 11:59 pm Start Course 3 (3) Control with Approximation (4) Policy Gradient Submit a discussion question for C3M3 and C3M4 by midnight on Sunday (October 11), using slido |
7 | October 12 | Holiday (Thanksgiving) | Project proposal due today at 11:59 pm |
7 | October 14 | Midterm review lecture: Slides | C3M3 and C3M4 due on Thursday (October 15) at 11:59 pm |
8 | October 19 | Review Lecture | |
8 | October 21 | Midterm | The midterm is effectively a final for all of the MOOC material |
9 | October 26 | Discussion about experimental design | |
9 | October 28 | Stand-up for projects Short lecture about partial observability |
|
10 | November 2 | Stand-up for projects Short presentation about writing |
|
10 | November 4 | Discussion with Undergrads in RL (Mentoring) | |
11 | November 9, 11 | No classes: Reading week | |
12 | November 16 | Stand-up for projects, Supplementary Lecture | First draft of Project Due on November 18 at 11:59 pm, that outlines the problem clearly, has a relatively complete literature survey and concrete plan for experiments/theory. The full description of requirements is on eClass. |
12 | November 18 | Lecture about some common issues in projects More discussion about papers |
|
13 | November 23 | Stand-up for projects. Discuss the Conservative Policy Iteration paper. |
|
13 | November 25 | Open discussion about projects Discuss the paper: Proximal Policy Optimization Algorithms. |
November 30 is the last day to withdraw from courses. |
14 | November 30 | Stand-up for projects. Discuss the paper: Non-delusional Q-learning and value-iteration. |
|
14 | December 2 | Stand-up for projects. Discuss the QUOTA paper . |
|
15 | December 7 | Office hours, to discuss projects | Final projects due by Friday (December 11) 11:59 pm |