I completed the Reinforcement Learning course (link) as part of OMSCS Spring 2017 semester. It was one of the most rewarding courses I took as part of the program till date.
The course was taught by professors Charles Isbell and Michael Littman, the same Profs who had taken the Machine Learning course previously (blog link). The course was really challenging considering the closely packed and research oriented home works and projects as well as the math/theory heavy course material. We had
- 6 home works which involved implementing different RL algorithms to solve given problems
- 3 projects out of which two were the reproduction of experiment results from prominent RL research papers and one was solving an RL problem using OpenAI Gym
Summarizing my key learnings from RL below:
- Reinforcement learning helps you train an AI agent to maximise some form of reward without prior understanding of the environment -i.e. model-free.
- E.g: Pacman. Here the agent (or player) can roam around the space using possible actions (left, right, up, down). When it consumes one of the small orbs, it gets points (+ve reward). When it eats the big orbs and then eats the enemy players, it again gets more points. However, if it’s eaten by one of the enemy players, it loses a life (-ve reward). If you let an RL agent play Pacman for some time, it will start playing randomly, but eventually, figure out the rules of the game and can potentially play better than a human player. All this without we injecting any domain knowledge (rules of the game, winning strategies etc.) beforehand! (crazy right?)
- Most RL research assumes all processes can be represented using MDPs (Markov Decision Processes). These are processes where the entire past can be represented using the current state of the agent.
- Learned about different RL algorithms such as:
- Value Iteration
- Policy Iteration
- TD-Lambda etc.
- Generalization using function approximation – This seemed to me to be one of the most promising sections of RL. It can effectively take RL outside the confines of Grid world and into the big and continuous state spaces of the real world.
- For one of our projects, we used DQN (Deep Q-Networks), one of the latest efforts in generalization using deep neural networks, published by DeepMind – a Google company.
- Reward Shaping – a mechanism to accelerate the learnings of the agents and help them get to their goals faster.
- POMDPs (Partially Observable MDPS) – These are closer to the processes which we see in real-life. We don’t get to know fully which state we are in. We have to work with a set of ‘belief states’ or probability distributions of possible states we might be in.
- Game Theory – I found this to be the most fun part of the course. It deals with stochastic games where multiple agents try to maximise their collective/competing rewards. This is again closer to the situations which we face in real-life. Topics include:
- Prisoners Dilemma
- Nash Equilibrium
- Folk theorem and sub-game perfect equilibrium
- Tit-for-Tat, Grim trigger, Pavlov etc. game strategies
- Coordinated equilibria, using side payments (Coco-Q) etc.
The course content was a bit too theoretic in some chapters (e.g.: AAA – Advanced Algorithm Analysis). I found lectures from David Silver, DeepMind to be a good supplementary course to build the required intuition for this course – link.
One of the really exciting moments in this course was when Prof. Richard Sutton, considered by many as the father of Reinforcement Learning, and the author of the primary textbook for RL (of our course and elsewhere) ‘Reinforcement Learning: An Introduction’ (second edition draft available from author’s website – link) appeared for one of our office hours as a special guest.
I found all the TAs for this course really knowledgeble and helpful. All the office hours were really useful and fun-filled at the same time. One of our TAs, Migual Morales has been featured in the OMSCS website recently – link.
In conclusion, this course has been one helluva ride that I enjoyed throughout! 🙂