CMPUT 655 Reinforcement Learning 1
Schedule
Syllabus
Described below.
Term:
Fall, 2025
Lecture Date and Time:
TR 11:00 am - 12:20 p.m.
Lecture Location:
MEC 4-3
Instruction Team:
Martha White (whitem@ualberta.ca)
Parham Panahi
Anastasiia Pedan
Office Hours:
Listed on eClass
Overview
This course provides an introduction to reinforcement learning, which focuses on the study and design of agents that interact with a complex, uncertain world to achieve a goal. We will emphasize agents that can make near-optimal decisions in a timely manner with incomplete information and limited computational resources. The course will cover Markov decision processes, planning, and policy and value function learning algorithms. The course will be heavily focused on research projects.
The course will use our MOOC on Reinforcement Learning, created by the instructor of this course. Most of the lecture material and assignments will come from the MOOC. In-class time will be spent on discussion, preparing for projects and then regular project discussions and presentations. The MOOC content will be covered more quickly (by mid October), so as to focus on research projects in the second half of the course.
Objectives
There are two primary goals for this course: to become an expert in the fundamentals of reinforcement learning and to complete a research project in reinforcement learning.
By the end of the course, you will have a solid grasp of the main ideas in reinforcement learning, which is the primary approach to statistical decision-making. Any student who understands the material in this course will understand the foundations of much of modern probabilistic artificial intelligence (AI) and be prepared to take more advanced courses (in particular CMPUT 609: Reinforcement Learning II), or to apply AI tools and ideas to real-world problems. That person will be able to apply these tools and ideas in novel situations – eg, to determine whether the methods apply to this situation, and if so, which will work most effectively. They will also be able to assess claims made by others, with respect to both software products and general frameworks, and also be able to appreciate some new research results.
The goal for the research project is to gain experience properly specifying and answering a research question. Projects in courses have a tendency to be too big, and be largely incomplete by the end of the course. That will not be acceptable in this course. The goal here is to specify a small, feasible question that can be addressed thoroughly within the time-frame of the course. No research paper is ever fully complete, but the standard is: could this be submitted to a real research venue? This means you have specified a clear question, and obtained clear evidence (complete theoretical statement or thorough empirical study with controls and statistical significance). To ensure this occurs, I will provide a list of well-scoped questions and part of class-time will be spent discussing these to help you decide on the project.
Finally, we will use some amount of peer feedback and peer grading in this course. It is important start learning how to give and receive constructive feedback on research.
Prerequisites
The course will use Python 3. We will use elementary ideas of probability, calculus, and linear algebra, such as expectations of random variables, conditional expectations, partial derivatives, vectors and matrices. Students should either be familiar with these topics or be ready to pick them up quickly as needed by consulting outside resources. You will also need to write-up your papers in LaTeX.
Course Topics
With a focus on AI as the design of agents learning from experience to predict and control their environment, topics will include
- Markov decision processes
- Planning by approximate dynamic programming
- Monte Carlo and Temporal Difference Learning for prediction
- Monte Carlo, Sarsa and Q-learning for control
- Dyna and planning with a learned model
- Prediction and control with function approximation
- Policy gradient methods
We will also read a paper on how to do better experiments in RL, which includes pitfalls to avoid.
Course Work and Evaluation
The primary evaluation will be from the project. You will have an initial draft of the project due in early November, to ensure you’ve made progress on a concrete and feasible project. The final project draft will be due on the last day of classes, and should be treated as a paper write-up that could be submitted to a venue (workshop, conference or journal). We will provide a list of project ideas that you must chose from. In some cases there may be a graduate student willing to provide guidance on the project. We will also actively use slack to discuss the projects, and use in-class time to do stand-up on the projects. This has the dual purpose of helping push forward and provide guidance on the projects, as well as learn from the behaviors of others. We will act something like a big lab helping each other out. Projects will be done in groups of 3-4 people.
The course work will come from the quizzes and assignments through the Coursera Platform. There will be one or two small programming assignments (notebook) and/or one or two multiple choice quizzes due each week, through the Coursera Platform. Each week, you have to complete the practice quizzes and submit a discussion question by midnight on Sunday, for discussion in class. That means you have to have completed the lectures and readings as well for that week. The course will have a midterm exam, that will come after completing the MOOC by mid October. The remainder of the course will be focused on projects.
There are 12 graded assignments. They are usually python notebooks, but sometimes it is a Graded Quiz or a Peer Review. All items will be due on Thursday at 11:59 pm. We will let you drop the lowest 2 of 12, to account for cases where you get sick or missed the assignment for some reason. These mulligans should address one-off issues; if you find you need more then there is likely a longer, on-going issue and you need to contact Martha to resolve that more wholistically. Each graded assignment has equal weight (30/10).
- Assignments (graded on Coursera): 30%
- Midterm Exam: 15%
- Participation (in project stand-ups and project discussions), Project Presentations and Peer Review: 10%
- Initial Draft of Project: 10%
- Final Draft of Project: 35%
Course Materials
All course reading material will be available online. We will be using videos from the RL MOOC. We will be using the following textbook extensively: Sutton and Barto, Reinforcement Learning: An Introduction, MIT Press. The book is available from the bookstore or online as a pdf here: http://www.incompleteideas.net/book/the-book-2nd.html
Academic Integrity
All assignments written and programming are to be done individually. No exceptions. Students must write their own answers and code. Students are permitted and encouraged to discuss assignment problems and the contents of the course. However, the discussion should always be about high-level ideas. Students should not discuss with each other (or tutors) while writing answers to written questions our programming. Absolutely no sharing of answers or code sharing with other students or tutors. All the sources used for problem solution must be acknowledged, e.g. web sites, books, research papers, personal communication with people, etc. The University of Alberta is committed to the highest standards of academic integrity and honesty. Students are expected to be familiar with these standards regarding academic honesty and to uphold the policies of the University in this respect. Students are particularly urged to familiarize themselves with the provisions of the Code of Student Behaviour and avoid any behaviour which could potentially result in suspicions of cheating, plagiarism, misrepresentation of facts and/or participation in an offence. Academic dishonesty is a serious offence and can result in suspension or expulsion from the University. (GFC 29 SEP 2003)
FAQ
See the course FAQ. Note that this is only publicly visible to University of Alberta students. Ensure that you are logged in with your University of Alberta account when you want to access this FAQ.