Reinforcement Learning – Experience

I completed the Reinforcement Learning course (link) as part of OMSCS Spring 2017 semester. It was one of the most rewarding courses I took as part of the program till date.

The course was taught by professors Charles Isbell and Michael Littman, the same Profs who had taken the Machine Learning course previously (blog link). The course was really challenging considering the closely packed and research oriented home works and projects as well as the math/theory heavy course material. We had

  • 6 home works which involved implementing different RL algorithms to solve given problems
  • 3 projects out of which two were the reproduction of experiment results from prominent RL research papers and one was solving an RL problem using OpenAI Gym

Summarizing my key learnings from RL below:

  • Reinforcement learning helps you train an AI agent to maximise some form of reward without prior understanding of the environment  -i.e. model-free.
  • E.g: Pacman. Here the agent (or player) can roam around the space using possible actions (left, right, up, down). When it consumes one of the small orbs, it gets points (+ve reward). When it eats the big orbs and then eats the enemy players, it again gets more points. However, if it’s eaten by one of the enemy players, it loses a life (-ve reward). If you let an RL agent play Pacman for some time, it will start playing randomly, but eventually, figure out the rules of the game and can potentially play better than a human player. All this without we injecting any domain knowledge (rules of the game, winning strategies etc.) beforehand! (crazy right?)
  • Screen Shot 2017-05-07 at 10.23.36 PM
  • Most RL research assumes all processes can be represented using MDPs (Markov Decision Processes). These are processes where the entire past can be represented using the current state of the agent.
  • Learned about different RL algorithms such as:
    • Value Iteration
    • Policy Iteration
    • Q-learning
    • TD-Lambda etc.
  • Generalization using function approximation – This seemed to me to be one of the most promising sections of RL. It can effectively take RL outside the confines of Grid world and into the big and continuous state spaces of the real world.
    • For one of our projects, we used DQN (Deep Q-Networks), one of the latest efforts in generalization using deep neural networks, published by DeepMind – a Google company.
  • Reward Shaping – a mechanism to accelerate the learnings of the agents and help them get to their goals faster.
  • POMDPs (Partially Observable MDPS) – These are closer to the processes which we see in real-life. We don’t get to know fully which state we are in. We have to work with a set of ‘belief states’ or probability distributions of possible states we might be in.
  • Game Theory – I found this to be the most fun part of the course. It deals with stochastic games where multiple agents try to maximise their collective/competing rewards. This is again closer to the situations which we face in real-life. Topics include:
    • Prisoners Dilemma
    • Nash Equilibrium
    • Folk theorem and sub-game perfect equilibrium
    • Tit-for-Tat, Grim trigger, Pavlov etc. game strategies
    • Coordinated equilibria, using side payments (Coco-Q) etc.

The course content was a bit too theoretic in some chapters (e.g.: AAA – Advanced Algorithm Analysis). I found lectures from David Silver, DeepMind to be a good supplementary course to build the required intuition for this course – link.

One of the really exciting moments in this course was when Prof. Richard Sutton, considered by many as the father of Reinforcement Learning, and the author of the primary textbook for RL (of our course and elsewhere) ‘Reinforcement Learning: An Introduction’ (second edition draft available from author’s website – link) appeared for one of our office hours as a special guest.

Screen Shot 2017-04-20 at 4.44.15 AM
Prof. Richard Sutton along with our TAs during an office hour

I found all the TAs for this course really knowledgeble and helpful. All the office hours were really useful and fun-filled at the same time. One of our TAs, Migual Morales has been featured in the OMSCS website recently – link.

In conclusion, this course has been one helluva ride that I enjoyed throughout! 🙂


Machine Learning – Experience

I recently completed CS 7641 – Machine Learning as part of my OMSCS coursework. The course was really enjoyable and informative.

The course was taught by Professors Charles Isbell and Micheal Littman. Both are really awesome. Contrary to most other courses on the topic, they have managed to make the course content easy to understand and interesting, without losing out on any of its essences. All videos are structured as conversations between the Profs where one acts as the teacher and other as the student – very effective.

All the course videos are available publicly on Youtube – link. Also, I would recommend watching this funny Capella on ML based on Thriller by the Profs – link. 🙂

The course was a literature survey and general introduction into the various areas in ML. It was primarily divided into 3 modules:

  • Supervised learning – where we are given a dataset with labels (emails classified as spam or not). You try to predict the labels for future data based on what you’ve already seen or ‘learned’.
    • Techniques include Decision Trees, K-Nearest Neighbours, Support Vector Machines (SVM), Neural Networks etc
  • Unsupervised learning – all about finding patterns in unlabeled data. Eg: Group similar products together (clustering) based on customer interactions. This can be really helpful in recommendations etc.
    • Randomized Optimization, clustering, feature selection and transformation etc.
  • Reinforcement learning – the most exciting one (IMHO). This overlays many concepts we usually consider as part of Artificial Intelligence. RL is about incentivizing machines to learn various tasks (such as playing chess) by providing different rewards.
    • Markov Decision Processes, Game Theory etc.
    • I found the concepts in GT such as the Prisoners Dilemma, Nash Equilibrium etc. and how they tie into RL interesting.

All of these are very vast subjects in themselves. The assignments were designed in such a way that we got to work with all of these techniques at least to some extent. The languages and libraries that we use were left to our choice, though guidance and recommendations were provided. Through that, got the opportunity to work with Weka, scikit-learn and BURLAP.

Overall, enjoyed the course really well. Hoping to take courses like Reinforcement Learning (link) to learn more about the topics in upcoming semesters.

The Art of Thinking Clearly

I recently completed reading The Art of Thinking Clearly (link) by Rolf Dobelli. I found the book an interesting, concise and useful read on the many biases of the human mind.

If you go through the reviews of the book on Goodreads (link) or anywhere else online, you are likely to end up with mixed reviews. The negative ones mostly criticizing the author of plagiarism. In fact, N.N. Taleb, the bestselling author of many books including The Black Swan (link) has gone ahead and written a detailed account of the instances where his ideas were plagiarized by Dobelli – link.

Interestingly, these happen to be the exact reasons why I ended up reading Dobelli’s book! Let me explain myself a bit here though that would mean slightly digressing from the subject of this post.

Reading Summary Books

On multiple accounts, I had considered buying books such as Taleb’s The Black Swan, Fooled By Randomness (link) or Nobel laureate Daniel Kahneman’s Thinking Fast and Slow (link) etc. to understand more about the human mind and it’s blind spots. Each time, the sheer size of these books have made me put off the task to a distant future.

It might very well be the case that these books (among others) were the first ones to discuss many of the ideas mentioned in Dobelli’s book and that they discuss these ideas with much more rigor. But for a casual reader like me who is looking for a high-level overview of the core essence of these books without taking the actual effort of reading these, books like Dobelli’s are the best options.

On a related note, I recently came across this extremely nice Youtube channel – link that takes this concept of reading summaries a step further by presenting 5-10 minute illustrative videos that summarise the essence of various famous self-help/philosophical books.

The Art of Thinking Clearly

Getting back to the book, there were two other negative points mentioned in the reviews that I was careful to watch out for while reading:

  1. In an effort to come up with ‘100‘ limitations of the human mind, Dobelli has added many somewhat obvious/insignificant ones also to the list. This can make the real insights hard to separate out for the casual reader esp. since they are given in no particular order.
  2.  Some of the anecdotes used are contextually inappropriate.

Keeping all these in mind, I’ve been able to get some good insights out of the book. A few of the ones that come to my mind include (not including bias definitions for brevity):

  • Confirmation bias (link) being the mother of all biases. This explains why you will never be able to convince someone in arguments where the topic has inherent uncertainties which are open to interpretations. Political discussions on the internet seem to be a good example.
  • Swimmer’s Body Illusion (link) – Also answers the question ‘Does Harvard make you smarter?’
  • Action Bias (link) – where we feel doing something is more productive than doing nothing, even though what we do might be counter-productive.
  • Effort Justification (link) – where we tend to value something acquired with more effort as more valuable rather than objectively valuing the utility of the item.
  • Illusion of Attention (link) – This one was an eye-opener. Particularly the observation that drivers talking on the phone are as susceptible to accidents as a drunken driver, even if you are on hands-free.
  • Survivorship Bias (link) – probably explains why people fail to understand the risks involved in starting startups and overestimate the chances of success.
  • Forer Bias (link) –  refers to the tendency of people to rate sets of statements as highly accurate for them personally even though the statements could apply to many people.

The entire list of biases can be found here – link.


The Pragmatic Programmer

After having it on my to-do and wish list for about a year, I finally ordered and read ‘The Pragmatic Programmer‘. It was a really interesting read. I was able to relate to many of the chapters in it. The book talks about how programmers can rise from journeymen to masters.

The book contains many (70 to be precise) one line nuggets of programming wisdom. The authors themselves have made these available online here. Coding Horror (Jeff Atwood) also has a handy quick reference to many of the ideas mentioned in the book – link.

Even though the tips by themselves are great, I would recommend reading the whole book rather than reading them in isolation. What makes the book great is the way the authors presents the ideas in easy-to-understand ways, often using small stories and analogies wherever applicable. Some of the interesting ones below:

The Broken Window Theory (wiki):

Consider a building with a few broken windows. If the windows are not repaired, the tendency is for vandals to break a few more windows. Eventually, they may even break into the building, and if it’s unoccupied, perhaps become squatters or light fires inside.

This is how human psychology works. The same is applicable in terms of software quality. If we introduce entropy into the system (in the form of poor code, lack of unit or integration testing, poor review practices etc.), it will spread rapidly and destroy the system. The opposite can also happen where once we establish an immaculate system and great practices, individuals would try not to be the first to lower the standards.

The Stone Soup

The story can be read here. The authors have lessons from both sides of the story:

Tip: Be a Catalyst for Change

Like how the soldiers (or travellers as per the wiki) influenced and brought about change gradually, if we show people a glimpse of the future, they will be more willing to participate.

Tip: Remember the big picture

Villagers fall for the stone trick since they failed to notice gradual changes. This can happen to our software systems and projects as well. The next point is related.

The Boiled Frog

If a frog is put suddenly into boiling water, it will jump out, but if it is put in cold water which is then brought to a boil slowly, it will not perceive the danger and will be cooked to death.

The story is often used as a metaphor for the inability or unwillingness of people to react to or be aware of threats that rise gradually. Gradual increases in CPU/memory utilisation or service latencies which eventually bring down systems come into mind here. Gradual feature-creep and/or project delays which eventually add up to failed projects are also examples.

Some of the programming pearls of wisdom that I found most compelling were:

The Requirement Pit 

Requirements are often unclear and mixed with current policies and implementation. We must capture the underlying semantic invariants as requirements and document the specific or current work practices as policy.

Tip: Abstractions live longer than details

The Law of Demeter for Functions (wiki)

An object’s method should call only methods belonging to:

  • Itself
  • Any parameters passed in
  • Objects it creates
  • Component objects

Following this law helps us write ‘shy’ code which minimises coupling between modules.

Listing other tips below:

  • DRY principle – Don’t Repeat Yourself. Avoid duplication of code or documentation.
  • Orthogonality – Decouple systems into independent components.
  • Always use version control (even for documents, memos, scripts – for everything)
  • Use Domain Specific Languages (DSLs) and Code Generators to simply development
  • Ruthless testing – Test early, test often, test automatically
  • Use prototypes and tracer bullets wherever and whenever possible


AI for Robotics – Experience

I studied AI for Robotics class as part of the Summer’16, OMSCS program. It was a really interesting and challenging experience. It was taught by Prof. Sebastian Thrun who lead the self-driving car project in Google. It was his team from Stanford which won the DARPA Grand Challenge in 2005 where they drove a car (Stanley) over 212 km of off-road course and came first. Incidentally Prof. Thrun is a co-founder at Udacity and was it’s CEO until recently.

The class consisted of two portions: 

  • a series of lectures combined with small programming tasks
  • two open-ended projects related to self-driving cars

The whole course centers around the use of probabilistic models to predict the various parameters involved such as the location of the robot car, the location of various landmarks, obstacles, moving targets such as other cars, pedestrians etc. The Prof also has an aptly titled text book ‘Probabilistic Robotics’ to go along with the course (though I couldn’t make much use of it).

The lectures covered the following topics:


Noise is an essential part of robotics.

There will be noise in the robot motion. Eg: If we instruct the robot to move 5 meters, the robot might end-up moving only 4.8 meters due to tire slipping or uneven surface.

There will be noise in sensor measurement. Eg: If the sensor readings tell us we are 3 meters from the car ahead, the actual distance might be 2.7 meters.

How can a robot car navigate the road safely given all these noises? That is exactly what localization addresses. The term refers to various techniques which help us ‘see-through’ the noise and identify the underlying motion model of the robot. The following localization techniques were taught in class:

  • Kalman filters: These work best for linear motions. The predictions are Gaussian distributions here and hence will be uni-modal i.e. the prediction will only tell which is the highest probability location of the robot (no info on 2nd or 3rd highest probability location etc). However, there are extension of the standard KF such as the Unscented KF and Extended KF which address the mentioned limitations.
  • Particle filters: These seem best suited for localization since they work for non-linear motions and support multi-modal distributions.
Localization in action: Hex bug path in black and localized particle in blue


Self-driving cars need to find the optimal path to their destination as well. The technique used for finding the most optimal path without exploring the entire state space is A* algorithm. Those who have learned AI in under-grad might be familiar with the approach. It involves the use of a heuristic function which gives a score for all possible movements based on how far the new state is from the goal state.

Control Theory

Humans drive cars smoothly. If we ask a robot to move on a particular course, by default it will either over-shoot or under-shoot its goal and then correct itself. This is because of the inherent delay in the move-sense feedback cycle. This keeps repeating leading to a zig-zag motion and overall unpleasant (and potentially dangerous) driving experience. There is a whole domain of control systems on how to smoothen out the robot motion as it approaches it’s desired course.

The technique we learned is the PID controller. This controller adjusts the steering angle of the robot at all points of its motion based on various proportional, differential and integral terms computed in relation to its CTE or cross track error (the lateral distance between the robot and the reference trajectory). 

Screen Shot 2016-08-09 at 9.06.33 PM
Here A represents robot motion without any controller and B represents one with PID controller.


Runaway robot

The first project was a set of 4 interesting challenges (plus a bonus challenge for the extra smart ones) where we need to locate a robot (aptly named 404) which ran away from an assembly line and capture it using a hunter bot. This was an individual project. It requires some level of ingenuity to some up with a working solution since the lessons from class were not directly applicable here.

Hunter bot (blue) chasing the runaway bot (black). The red dots are future predictions with which the hunter tries to capture the bot.

Hex bug motion prediction

The second project was a team project. Here we were given coordinates of random movements of a hex bug for 2 minutes at 30 fps (frames per second). We need to predict the last 2 seconds i.e. 60 frames of the bug’s motion. This was an open ended problem and we could use any technique from inside the class or outside. We were a team of 4 and explored various techniques including clustering trajectories, creating a markov model and finally ended up using PF to solve the same.

Predictions of hex bug path using various approaches against actual bug path (in black)

Overall, enjoyed the class a lot!

Knowledge based AI – Experience

I enrolled for the Spring’16 batch of OMSCS program offered by Georgia Tech university and Udacity. As my first course as part of the program, I chose Knowledge base AI, taught by Prof. Ashok Goel and David Joyner. The video sessions of the course can be freely accessed through Udacity website here.

The class was very interesting and insightful. I thoroughly enjoyed the 3 main projects we had to do throughout the class. The overall class was focused on systematically studying human-level intelligence/cognition and seeing how we can build that using technology. As a measure of accessing human cognition, the class used Raven’s Progressive Matrices (RPM).

Our class came into the internet limelight after the course when Prof Ashok revealed that one of out TA – Jill Watson was actually a bot. Covered widely by press – Washington Post and WSJ.

With regard to the course content, we were introduced to the broad areas that come under human level AI research such as:

  • Semantic Networks, Frames and Scripts – useful knowledge representations
  • Generate & Test and Means-End Analysis – two popular problem solving techniques
  • Production systems – rule based systems which are useful in AI
  • Learning By Recording Cases and Case base reasoning – techniques to learn based on past examples and to adapt them as per requirement
  • Incremental concept learning – contrary to approaches like ML where we feed millions of examples to train models, human cognition deals with incremental data inputs. Incremental concept learning describes how we can use generalizations and specializations to make inferences from these inputs.
  • Classification – mapping percepts to concepts so that we can take actions
  • Formal Logic – techniques from predicate calculus such as resolution theorem proving are useful in some cases of reasoning. However human cognition is inductive and abductive in nature, whereas logic is deductive.
  • Understanding – Humans always deal with ambiguity. Eg: Same word can have different contextual meanings. Understanding is how we leveraging available constraints to resolve ambiguities.
  • Common sense reasoning – about modeling our world in terms of a set of primitive actions and their combination.
  • Explanation based learning and Analogical reasoning – deals with how we can extract ‘abstractions’ from prior knowledge and transfer these to new situations
  • Diagnosis and learning by correcting mistakes – Determining what is wrong with a malfunctioning device/system. Learning by correcting these mistakes.
  • Meta-reasoning – reasoning about our own reasoning process and applying all the above techniques to the same.

As part of the projects, we built small ‘AI agents’ that process RPM problem images and use various techniques to solve them. These were quite challenging and required fair amount of programming. For me, it was a good learning opportunity for building something non-trivial in Python. We used Python image processing library PIL and various image processing techniques like connected component labelling.

Considering the vastness of the topic – human level intelligence and that I intent to specialize in ‘Interactive Intelligence‘, I found the course really interesting and informative. Enjoyed it!

I’m adding a nice poster created by my classmate Eric on the high level topics we studied in this class.






Google summer of code 2014 – Experience

This blog has been long pending. Infact, I took part in GSoC 2014 and GSoC 2015 application process has already started. But I’ll share my experience anyway.

I interned with Raxa. They are into building web and mobile applications to help small clinics and hospitals go online. Their applications are build on top of the OpenMRS platform. Raxa was fully open source earlier but have moved to a hybrid model presently. All the GSoC projects are open source and available through GitHub.

I started late in the application process. I didn’t have specific plans for applying to GSoC this time. The list of organizations had been announced and application deadline was pretty close. One fine day, I thought I’ll just glance over the organizations that have been accepted and see if there are any that meets my areas of expertise. Raxa was planning to enable patients without smart phones to access their medical services using phone calls via IVR (Interactive Voice Response) and via SMS. This required knowledge of an open source telephony server called Asterisk. I had spend a decent chunk of my third year working on a social initiative/startup called Dial Blood which was built on Asterisk. Much of my final year was spend working on Findauto which was an SMS based auto rickshaw booking startup. Hence it made a lot of sense for me to apply for this project.

My application emphasized on why I’m the right person to work on this project and contained a week-by-week systematic and clear breakdown of how I’ll go about completing the project. The key features I proposed to enable were:

  • Appointment scheduling via SMS or call
  • Calling/submitting queries to doctors

One feedback I got on my application was that there was scope for more to be done in three months. They wanted to make sure I was productivity engaged through out the three months with additional tasks to take up if I manage to finish these early. IVR can be at times seem complicated to the rural audience Raxa was primarily trying to appeal to. Hence instead of asking people to press digits (press 1 for English, 2 for Hindi..), I thought of using voice recognition and natural language processing to hear their response and act accordingly. The inclusion of this task made my application reasonably strong.

I applied only to Raxa and thankfully made it when the results came out. They had selected four interns – one girl from Srilanka, one IIT Delhi grad who had previously interned with them outside GSoC and another student who was pursuing his masters degree in computer science, apart from me. They all were working on really interesting projects which included mobile application development to machine learning. More details can be found here.

We also used to have weekly meetings with the whole Raxa team. Our progress was evaluated each week. I was assigned a mentor. I could reach out to him anytime for any assistance. My project was a continuation of last year’s GSoC project. I had some difficulty in the code ramp up because of limited documentation. Apart from that, Asterisk part went smoothly. For the voice recognition part, I was planning to use Google’s voice recognition APIs. But Google had deprecated the free version by then. Other popular alternatives like Spinx had limitations working with Asterisk because of some frequency mismatch issues. This was a blocker for me. My mentor helped me identify a NLP startup (Wit.AI) working in this domain. They had really powerful APIs available for free. I used those in my GSoC project and managed to implement the feature successfully.

Another main task that I did was research on what’s the best way to take the service into production. Various options including Amazon EC2, own servers and third party services were explored. I also contacted various telephony service providers to inquire about PRI lines and their pricing. We were not able to take the service into production because of some constraints Raxa had. But I was code complete and production ready by end of summer.

We had a final demo day where all 4 of us presented our projects. We all managed to complete GSoC successfully. It was a really good experience for me to work on such a big project end to end under strict time constraints. Not to mention the stipend and awesome goodies.. 🙂

My GSoC project code link –

Wiki link –

– Written using WordPress for Android