Reinforcement Learning – Experience

I completed the Reinforcement Learning course (link) as part of OMSCS Spring 2017 semester. It was one of the most rewarding courses I took as part of the program till date.

The course was taught by professors Charles Isbell and Michael Littman, the same Profs who had taken the Machine Learning course previously (blog link). The course was really challenging considering the closely packed and research oriented home works and projects as well as the math/theory heavy course material. We had

  • 6 home works which involved implementing different RL algorithms to solve given problems
  • 3 projects out of which two were the reproduction of experiment results from prominent RL research papers and one was solving an RL problem using OpenAI Gym

Summarizing my key learnings from RL below:

  • Reinforcement learning helps you train an AI agent to maximise some form of reward without prior understanding of the environment  -i.e. model-free.
  • E.g: Pacman. Here the agent (or player) can roam around the space using possible actions (left, right, up, down). When it consumes one of the small orbs, it gets points (+ve reward). When it eats the big orbs and then eats the enemy players, it again gets more points. However, if it’s eaten by one of the enemy players, it loses a life (-ve reward). If you let an RL agent play Pacman for some time, it will start playing randomly, but eventually, figure out the rules of the game and can potentially play better than a human player. All this without we injecting any domain knowledge (rules of the game, winning strategies etc.) beforehand! (crazy right?)
  • Screen Shot 2017-05-07 at 10.23.36 PM
  • Most RL research assumes all processes can be represented using MDPs (Markov Decision Processes). These are processes where the entire past can be represented using the current state of the agent.
  • Learned about different RL algorithms such as:
    • Value Iteration
    • Policy Iteration
    • Q-learning
    • TD-Lambda etc.
  • Generalization using function approximation – This seemed to me to be one of the most promising sections of RL. It can effectively take RL outside the confines of Grid world and into the big and continuous state spaces of the real world.
    • For one of our projects, we used DQN (Deep Q-Networks), one of the latest efforts in generalization using deep neural networks, published by DeepMind – a Google company.
  • Reward Shaping – a mechanism to accelerate the learnings of the agents and help them get to their goals faster.
  • POMDPs (Partially Observable MDPS) – These are closer to the processes which we see in real-life. We don’t get to know fully which state we are in. We have to work with a set of ‘belief states’ or probability distributions of possible states we might be in.
  • Game Theory – I found this to be the most fun part of the course. It deals with stochastic games where multiple agents try to maximise their collective/competing rewards. This is again closer to the situations which we face in real-life. Topics include:
    • Prisoners Dilemma
    • Nash Equilibrium
    • Folk theorem and sub-game perfect equilibrium
    • Tit-for-Tat, Grim trigger, Pavlov etc. game strategies
    • Coordinated equilibria, using side payments (Coco-Q) etc.

The course content was a bit too theoretic in some chapters (e.g.: AAA – Advanced Algorithm Analysis). I found lectures from David Silver, DeepMind to be a good supplementary course to build the required intuition for this course – link.

One of the really exciting moments in this course was when Prof. Richard Sutton, considered by many as the father of Reinforcement Learning, and the author of the primary textbook for RL (of our course and elsewhere) ‘Reinforcement Learning: An Introduction’ (second edition draft available from author’s website – link) appeared for one of our office hours as a special guest.

Screen Shot 2017-04-20 at 4.44.15 AM
Prof. Richard Sutton along with our TAs during an office hour

I found all the TAs for this course really knowledgeble and helpful. All the office hours were really useful and fun-filled at the same time. One of our TAs, Migual Morales has been featured in the OMSCS website recently – link.

In conclusion, this course has been one helluva ride that I enjoyed throughout! 🙂

Machine Learning – Experience

I recently completed CS 7641 – Machine Learning as part of my OMSCS coursework. The course was really enjoyable and informative.

The course was taught by Professors Charles Isbell and Micheal Littman. Both are really awesome. Contrary to most other courses on the topic, they have managed to make the course content easy to understand and interesting, without losing out on any of its essences. All videos are structured as conversations between the Profs where one acts as the teacher and other as the student – very effective.

All the course videos are available publicly on Youtube – link. Also, I would recommend watching this funny Capella on ML based on Thriller by the Profs – link. 🙂

The course was a literature survey and general introduction into the various areas in ML. It was primarily divided into 3 modules:

  • Supervised learning – where we are given a dataset with labels (emails classified as spam or not). You try to predict the labels for future data based on what you’ve already seen or ‘learned’.
    • Techniques include Decision Trees, K-Nearest Neighbours, Support Vector Machines (SVM), Neural Networks etc
  • Unsupervised learning – all about finding patterns in unlabeled data. Eg: Group similar products together (clustering) based on customer interactions. This can be really helpful in recommendations etc.
    • Randomized Optimization, clustering, feature selection and transformation etc.
  • Reinforcement learning – the most exciting one (IMHO). This overlays many concepts we usually consider as part of Artificial Intelligence. RL is about incentivizing machines to learn various tasks (such as playing chess) by providing different rewards.
    • Markov Decision Processes, Game Theory etc.
    • I found the concepts in GT such as the Prisoners Dilemma, Nash Equilibrium etc. and how they tie into RL interesting.

All of these are very vast subjects in themselves. The assignments were designed in such a way that we got to work with all of these techniques at least to some extent. The languages and libraries that we use were left to our choice, though guidance and recommendations were provided. Through that, got the opportunity to work with Weka, scikit-learn and BURLAP.

Overall, enjoyed the course really well. Hoping to take courses like Reinforcement Learning (link) to learn more about the topics in upcoming semesters.

AI for Robotics – Experience

I studied AI for Robotics class as part of the Summer’16, OMSCS program. It was a really interesting and challenging experience. It was taught by Prof. Sebastian Thrun who lead the self-driving car project in Google. It was his team from Stanford which won the DARPA Grand Challenge in 2005 where they drove a car (Stanley) over 212 km of off-road course and came first. Incidentally Prof. Thrun is a co-founder at Udacity and was it’s CEO until recently.

The class consisted of two portions: 

  • a series of lectures combined with small programming tasks
  • two open-ended projects related to self-driving cars

The whole course centers around the use of probabilistic models to predict the various parameters involved such as the location of the robot car, the location of various landmarks, obstacles, moving targets such as other cars, pedestrians etc. The Prof also has an aptly titled text book ‘Probabilistic Robotics’ to go along with the course (though I couldn’t make much use of it).

The lectures covered the following topics:

Localization

Noise is an essential part of robotics.

There will be noise in the robot motion. Eg: If we instruct the robot to move 5 meters, the robot might end-up moving only 4.8 meters due to tire slipping or uneven surface.

There will be noise in sensor measurement. Eg: If the sensor readings tell us we are 3 meters from the car ahead, the actual distance might be 2.7 meters.

How can a robot car navigate the road safely given all these noises? That is exactly what localization addresses. The term refers to various techniques which help us ‘see-through’ the noise and identify the underlying motion model of the robot. The following localization techniques were taught in class:

  • Kalman filters: These work best for linear motions. The predictions are Gaussian distributions here and hence will be uni-modal i.e. the prediction will only tell which is the highest probability location of the robot (no info on 2nd or 3rd highest probability location etc). However, there are extension of the standard KF such as the Unscented KF and Extended KF which address the mentioned limitations.
  • Particle filters: These seem best suited for localization since they work for non-linear motions and support multi-modal distributions.
localizing
Localization in action: Hex bug path in black and localized particle in blue

Search

Self-driving cars need to find the optimal path to their destination as well. The technique used for finding the most optimal path without exploring the entire state space is A* algorithm. Those who have learned AI in under-grad might be familiar with the approach. It involves the use of a heuristic function which gives a score for all possible movements based on how far the new state is from the goal state.

Control Theory

Humans drive cars smoothly. If we ask a robot to move on a particular course, by default it will either over-shoot or under-shoot its goal and then correct itself. This is because of the inherent delay in the move-sense feedback cycle. This keeps repeating leading to a zig-zag motion and overall unpleasant (and potentially dangerous) driving experience. There is a whole domain of control systems on how to smoothen out the robot motion as it approaches it’s desired course.

The technique we learned is the PID controller. This controller adjusts the steering angle of the robot at all points of its motion based on various proportional, differential and integral terms computed in relation to its CTE or cross track error (the lateral distance between the robot and the reference trajectory). 

Screen Shot 2016-08-09 at 9.06.33 PM
Here A represents robot motion without any controller and B represents one with PID controller.

 

Runaway robot

The first project was a set of 4 interesting challenges (plus a bonus challenge for the extra smart ones) where we need to locate a robot (aptly named 404) which ran away from an assembly line and capture it using a hunter bot. This was an individual project. It requires some level of ingenuity to some up with a working solution since the lessons from class were not directly applicable here.

the_chase
Hunter bot (blue) chasing the runaway bot (black). The red dots are future predictions with which the hunter tries to capture the bot.

Hex bug motion prediction

The second project was a team project. Here we were given coordinates of random movements of a hex bug for 2 minutes at 30 fps (frames per second). We need to predict the last 2 seconds i.e. 60 frames of the bug’s motion. This was an open ended problem and we could use any technique from inside the class or outside. We were a team of 4 and explored various techniques including clustering trajectories, creating a markov model and finally ended up using PF to solve the same.

hexbot_predictions
Predictions of hex bug path using various approaches against actual bug path (in black)

Overall, enjoyed the class a lot!

Knowledge based AI – Experience

I enrolled for the Spring’16 batch of OMSCS program offered by Georgia Tech university and Udacity. As my first course as part of the program, I chose Knowledge base AI, taught by Prof. Ashok Goel and David Joyner. The video sessions of the course can be freely accessed through Udacity website here.

The class was very interesting and insightful. I thoroughly enjoyed the 3 main projects we had to do throughout the class. The overall class was focused on systematically studying human-level intelligence/cognition and seeing how we can build that using technology. As a measure of accessing human cognition, the class used Raven’s Progressive Matrices (RPM).

Our class came into the internet limelight after the course when Prof Ashok revealed that one of out TA – Jill Watson was actually a bot. Covered widely by press – Washington Post and WSJ.

With regard to the course content, we were introduced to the broad areas that come under human level AI research such as:

  • Semantic Networks, Frames and Scripts – useful knowledge representations
  • Generate & Test and Means-End Analysis – two popular problem solving techniques
  • Production systems – rule based systems which are useful in AI
  • Learning By Recording Cases and Case base reasoning – techniques to learn based on past examples and to adapt them as per requirement
  • Incremental concept learning – contrary to approaches like ML where we feed millions of examples to train models, human cognition deals with incremental data inputs. Incremental concept learning describes how we can use generalizations and specializations to make inferences from these inputs.
  • Classification – mapping percepts to concepts so that we can take actions
  • Formal Logic – techniques from predicate calculus such as resolution theorem proving are useful in some cases of reasoning. However human cognition is inductive and abductive in nature, whereas logic is deductive.
  • Understanding – Humans always deal with ambiguity. Eg: Same word can have different contextual meanings. Understanding is how we leveraging available constraints to resolve ambiguities.
  • Common sense reasoning – about modeling our world in terms of a set of primitive actions and their combination.
  • Explanation based learning and Analogical reasoning – deals with how we can extract ‘abstractions’ from prior knowledge and transfer these to new situations
  • Diagnosis and learning by correcting mistakes – Determining what is wrong with a malfunctioning device/system. Learning by correcting these mistakes.
  • Meta-reasoning – reasoning about our own reasoning process and applying all the above techniques to the same.

As part of the projects, we built small ‘AI agents’ that process RPM problem images and use various techniques to solve them. These were quite challenging and required fair amount of programming. For me, it was a good learning opportunity for building something non-trivial in Python. We used Python image processing library PIL and various image processing techniques like connected component labelling.

Considering the vastness of the topic – human level intelligence and that I intent to specialize in ‘Interactive Intelligence‘, I found the course really interesting and informative. Enjoyed it!

I’m adding a nice poster created by my classmate Eric on the high level topics we studied in this class.

kbaipostersmall