Intro to High Performance Computing – Experience

This Spring, I completed the Intro to High Performance Computing course (CSE 6220) as part of OMSCS. It is one of the hardest (4.5/5) and also the highest rated (4.8/5) course in the program as per OMS Central. Based on my experience,  I concur with both ratings.

At a high level, the course covers the algorithmic aspects of maximizing the performance of your code. This includes things like parallelizing your code across all processors or across multiple machines, exploiting the memory hierarchy to your advantage etc. The other ‘high performance’ course in the program – High Performance Computer Architectures (CS 6290), in contrast, discusses maximizing performance more at a processor architecture level.

Prof. Vuduc requires special mention. He has put a lot of effort in making the course videos easy-to-understand and interesting. His hilarious antics make you laugh even while discussing the most complex topics. He is also very active in Piazza and participates in the office hours regularly.

There were 5 hands-on projects in total, all in C/C++, with one due every two weeks. These were really the time-sinks. Interestingly, these were also the most fun part of the course in my experience. These involved implementing the algorithms taught in the lectures, making everything you learn more ‘real’.

Key concepts

At a broad level, these were the key concepts I learned from the course:

  1. Shared memory model (aka dynamic multithreading model)
    1. Concepts of work and span (link) in analyzing parallel programs.
    2. Introduction to OpenMP library.
  2. Distributed memory models
    1. Parallel computing across network using message passing.
    2. The Alpha-Beta model (aka latency & inverse-bandwidth model) for analyzing distributed parallel programs.
    3. Introduction to OpenMPI library.
  3. Two level memory model
    1. I/O aware algorithms that can exploit the cache and main memory structures.
    2. Cache oblivious algorithms that still achieve optimal performance without being aware of the cache/memory structures.
alpha_beta_model
The Alpha-Beta model for measuring the cost of distributed computing

For a more detailed overview, I would recommend going over the Course Syllabus (PDF).

Overall, I really enjoyed the course and am happy that I decided to take it though it was not directly related to my specialization (Interactive Intelligence).

Advertisements

Hyperledger Sawtooth – Introductory tutorial

In this blog post, I will try to quickly cover Hyperledger Sawtooth basics and explain how to get to a productive development setup starting from the Sawtooth core codebase.

Hyperledger is an open-source collaborative effort to create industry standard blockchain technologies. It is backed by the Linux Foundation. You can read more about the effort here – https://www.hyperledger.org/about.

This edX course would be a good starting place for anyone interested in Hyperledger – https://www.edx.org/course/blockchain-business-introduction-linuxfoundationx-lfs171x.

Sawtooth is one of the mature projects (1.0 released in Jan 2018) within Hyperledger, initially contributed by Intel. It is designed with Enterprise use-cases in mind. Among other things, Sawtooth introduces a novel consensus algorithm called Proof of Elapsed Time (PoET), has support for permissioned private blockchains, parallel transaction execution, dynamic switching of consensus algorithms, compatibility with Ethereum (via Seth transaction family) etc. It also has development support for a broad language base including Python, Go, Java etc.

Why this post?

Sawtooth is one of the most well documented piece of open-source software that I’ve come across.  I would recommend anyone interested in learning Sawtooth to go through it – link.

Having said that, I feel that the level of information in the docs can be a bit daunting, especially for someone new to blockchain. Also, in my case, the fact that I wasn’t very comfortable with docker added an additional layer of indirection/complexity in the learning process. I need to be able to see how something starts up beginning from the code to fully understand how it works.

I couldn’t find any tutorials on Sawtooth outside the documentation as of today. I reached out to the friendly experts hanging out in the  Sawtooth Rocket chat channel to clear many issues I came across while trying to play with Sawtooth. This post is my effort to give back to the community, in the form of a basic intro to setting up sawtooth from scratch.

Components

Sawtooth_node
Sawtooth components

Transaction Processor

The transaction processor maintains all domain specific business logic for a transaction family (similar to smart contracts) such as:

  1. Serializing/Deserializing the state stored in the ledger to domain objects.
  2. Providing the business logic for validating transactions

Validator

The validator handles consensus, all blockchain read/writes, peer interaction etc. It is agnostic of the business logic. In fact, one validator process running in a host can be connected to multiple transaction processor processes.

Client

Clients (CLI, web services etc.) interact with a transaction processor via a REST API (a separate process).

If you are interested in playing around with Sawtooth using pre-built images, the installation section of Sawtooth would be helpful – link. You can use docker images, Ubuntu packages or install Sawtooth from the AWS marketplace.

Getting your hands dirty

In this section, I will try to get a working instance of Intkey transaction processor (TP) running locally in Python. Intkey is a sample TP for setting, incrementing and decrementing values in a dictionary. First of all, checkout the sawtooth-core code-base

git clone https://github.com/hyperledger/sawtooth-core

We will still use a provided docker container for running all our processes to save us the trouble of getting the right version of all the dependencies required. In our case, we will use the sawtooth-dev-python image. Also, we will build and use the validator and REST API directly (not from code) as these don’t need to be changed by application developers.

Setup the validator

Build everything (install docker if you haven’t):

$ cd sawtooth-core/docker
$ docker build . -f sawtooth-dev-python -t sawtooth-dev-python

Now if you list your docker images, you should see the newly created one:

$ docker images
REPOSITORY          TAG    IMAGE ID     CREATED    SIZE
sawtooth-dev-python latest ecf3ae086754 2 days ago 817MB

Next, let’s start a container and get inside it:

$ cd sawtooth-core
$ docker run -v $(pwd):/project/sawtooth-core -it sawtooth-dev-python bash

This should start a container using the sawtooth-dev-python image and get you to a root prompt prompt within the container with our present working directory (sawtooth-core) mounted as /project/sawtooth-core.

root@43edef3881be:/project/sawtooth-core# ls

Before we start up the validator, we need to do some setup here. Basically create the genesis block and required keys.

# sawtooth keygen
# sawset genesis
# sawadm genesis config-genesis.batch

Next, let’s start up the validator:

# sawadm keygen
# cd bin/
# sawtooth-validator -vv

You would want to keep the validator running for the rest of the steps to work. For that, you will need to connect to the container from a different terminal. For that, you need to find the container id for the container that you just created:

$ docker container ls
CONTAINER ID IMAGE               COMMAND CREATED   STATUS    PORTS              NAMES
43edef3881be sawtooth-dev-python "bash" 8 days ago Up 8 days 4004/tcp, 8008/tcp keen_keller

Now, to get into the container from this new terminal, you just need to do the following:

$ docker exec -it 43edef3881be bash

Now, we need to start up the settings transaction processor which is required for the validator to function correctly:

# cd bin/
# settings-tp -vv

You should see some logging in the validator terminal tab/window along these lines now:

[2018-04-12 07:25:27.657 INFO processor_handlers] registered transaction processor: connection_id=f519f4ff0bd1e26968d5bcd76cd71eed88d097c5a4846798c35f5d9c5efeb8845ea65fdf2172c12c7b4a226fc4214c18f6989c4048e0012c2fb5378252d67a08, family=sawtooth_settings, version=1.0, namespaces=['000000'], max_occupancy=10
[2018-04-12 07:25:27.705 INFO genesis] Genesis block created: 69f85216c3137fdc0ee31f8d754dfd134198b6d97f31c45e52ccbb3af356bbbe443aa38ba6a3380a7aab734b939881dbb198367311423b0b8d9aa39569d186eb (block_num:0, state:2cd74660fc59b472699aad3f6da884ae636191673e6773e6d81b9c8987065e9f, previous_block_id:0000000000000000)
[2018-04-12 07:25:27.709 INFO interconnect] Listening on tcp://127.0.0.1:8800
[2018-04-12 07:25:27.712 INFO chain] Chain controller initialized with chain head: 69f85216c3137fdc0ee31f8d754dfd134198b6d97f31c45e52ccbb3af356bbbe443aa38ba6a3380a7aab734b939881dbb198367311423b0b8d9aa39569d186eb (block_num:0, state:2cd74660fc59b472699aad3f6da884ae636191673e6773e6d81b9c8987065e9f, previous_block_id:0000000000000000)

There is one more piece required before we can get to our Intkey transaction processor – the REST API. Again, from a new terminal, get into the container and run:

# cd bin/
# sawtooth-rest-api -vv

At this point, you have a functional sawtooth validator. You can play around with it using the sawtooth command. Some examples:

# sawtooth state list 
ADDRESS SIZE DATA
000000a87cb5eafdcca6a8cde0fb0dec1400c5ab274474a6aa82c12840f169a04216b7 110 b'\n...
HEAD BLOCK: "69f85216c3137fdc0ee31f8d754dfd134198b6d97f31c45e52ccbb3af356bbbe443aa38ba6a3380a7aab734b939881dbb198367311423b0b8d9aa39569d186eb"
# sawtooth transaction list
TRANSACTION_ID FAMILY VERS SIZE PAYLOAD
5964642adb957ca994f5789e9a5f9930853c20359a8adbc5c18ecf4b338fc9a00f544f42cf17d7cdc79531727fbdb307f4143c8104ec9ed0e5c3557a476cccdb sawtooth_settings 1.0 131 b'\x08\...

You can also directly query the REST API as follows:

# curl http://localhost:8008/blocks
{
 "data": [
 {
 "batches": [
 {
 "header": {
 "signer_public_key": "0317f01f8958bc6d404d1ffe88770fe927fef63022216f24484906760873501d7f",
 "transaction_ids": [
 "5964642adb957ca994f5789e9a5f9930853c20359a8adbc5c18ecf4b338fc9a00f544f42cf17d7cdc79531727fbdb307f4143c8104ec9ed0e5c3557a476cccdb"
 ]
 }.. // trimmed to keep it short

Intkey

The Transaction Processor

You’ll notice that pre-build Intkey transaction processor binaries are also available in the bin folder. However let’s try to run it from directly from the code. The main.py (code link) in the following directory actually starts the example Intkey processor:

/project/sawtooth-core/sdk/examples/intkey_python/sawtooth_intkey/processor/main.py

We need python3 for some of the dependencies. But if you directly try to run main.py from python, you will run into couple of issues. You’ll notice it depends on the sawtooth_sdk module, which in turn depends on sawtooth_signing module. Lets build them first.

// First, lets install sawtooth_signing
# cd /project/sawtooth-core/signing/
# python3 setup.py install

// Next, lets install sawtooth_sdk
# cd /project/sawtooth-core/sdk/python/sawtooth_sdk
# python3 setup.py install

Now, we need to tweak the main.py file a bit to work in run-from-source approach.

It tries to import from sawtooth_intkey.processor.handler module, which it will fail to find. The handler.py is however right besides the main.py in the same directory. So lets modify main.py to use that:

- from sawtooth_intkey.processor.handler import IntkeyTransactionHandler
+ from handler import IntkeyTransactionHandler

At this point, we should be able to run main.py without any errors. But it will exit without doing anything. Thats because no-one is calling the main method. Let’s add a bit of code at the end of the file to do that:

if __name__ == "__main__":
    main()

Notice that you can modify the file in your host machine itself i.e outside the container. Since the sawtooth-core folder is shared with the container, it will be automatically up-to date with the latest code.

Now, your Intkey transaction processor is ready to roll! You can try some of the commands as follows:

# python3 main.py -V
sawtooth-intkey (Hyperledger Sawtooth) version UNKNOWN

Here is how you really start it:

# python3 main.py -vv
[2018-04-12 08:09:31.254 INFO core] register attempt: OK

You should see the validator also logging in its terminal the fact that intkey TP connected successfully. Something like:

[2018-04-12 08:09:31.248 INFO processor_handlers] registered transaction processor: connection_id=acf1ed62db9a2e18c88af958532ac7c9857d6db50d951348eabda8ba5e0913e9a4deeb3fa607cbf6897efdc0585559ea38470cfbbe96141668369db16d3f45a5, family=intkey, version=1.0, namespaces=['1cf126'], max_occupancy=10

Intkey CLI

Let’s try running the CLI to interact with the Intkey transaction processor.

You will notice the provided intkey_cli.py file references other files in its directory using sawtooth_intkey.client_cli.xyz path. For it to be able to find these files correctly, lets copy it to the appropriate location:

# cd /project/sawtooth-core/sdk/examples/intkey_python/
# cp sawtooth_intkey/client_cli/intkey_cli.py .

Next, similar to the earlier main.py, we need to add an entry point for the intkey_cli.py. Add the following snippet at the end of the file:

if __name__ == '__main__':
    main_wrapper()

This completes our setup! Let’s play around with it:

# python3 intkey_cli.py --help
usage: intkey_cli.py [-h] [-v] [-V]
 {set,inc,dec,show,list,generate,load,populate,create_batch,workload}
 ...

optional arguments:
 -h, --help show this help message and exit
 -v, --verbose enable more verbose output
 -V, --version display version information

subcommands:
 {set,inc,dec,show,list,generate,load,populate,create_batch,workload}
 set Sets an intkey value
 inc Increments an intkey value
 dec Decrements an intkey value
 show Displays the specified intkey value
 list Displays all intkey values

If you list all the entries in the blockchain now, it should be empty (as we did not insert any):

# python3 intkey_cli.py list

Now, sets try setting and getting a key-value pair:

// Set key 'a' to value 10
# python3 intkey_cli.py set a 10
{
 "link": "http://127.0.0.1:8008/batch_statuses?id=46f82450d39c6c59968efc83a15669477804ccb9ad569a7b61bebecf6cf55f931e0bd19c4de42926b7ce90b320b3ffe50e411130957b1812f8e4f2a45865c8ed"
}

//Let's try listing again
# python3 intkey_cli.py list
a: 10

// Let's try reading the key
# python3 intkey_cli.py show a
a: 10

// Let's try incrementing 'a' by 2
# python3 intkey_cli.py inc a 2
{
 "link": "http://127.0.0.1:8008/batch_statuses?id=8d2f5e0ef324ba15a5cd95392cd65caf6df7124f7c06dc0531feb77cfda49f047c765cb96557a3de0f91902e7b2b3212111b997fe663c7b9bb610bd7a7ad6759"
}

// Show again
# python3 intkey_cli.py show a
a: 12

You can also do batch import of random keys (for testing, measuring throughput etc.) as follows:

// Create 1 batch file with operations on 5 keys
# python3 intkey_cli.py create_batch -K 5 -c 1
Writing to batches.intkey...

// Load the batch file
# python3 intkey_cli.py load
batches: 2 batch/sec: 72.39234705765597

// List the added keys
# python3 intkey_cli.py list
a: 12
ZmyetF: 30086
ReocMV: 59247
CINQSf: 57819
BWADZo: 39267
RoDdEV: 47475

You should be able to see the corresponding logging for each of these commands in the REST API, transaction processor and validator terminal consoles.

In summary, we started by checking out the sawtooth-core codebase and ended with a docker container running the following:

  1. Validator
  2. Settings Transaction Processor
  3. REST API
  4. Intkey Transaction Processor (from source)
  5. Intkey CLI (from source)

This concludes my hands-on overview of working with Hyperledger Sawtooth.

I hope this will be helpful as a good starting place for developers trying to create their own custom transaction families in Sawtooth. In case you run into any issues while following these instructions or have any questions, please let me know. I’ll try to help you out. As I mentioned earlier, the Sawtooth Rocket chat is a good place to get help from experts in case you are stuck.

Spacemacs – Among the stars aboard the Evil flagship (vim)

Screen Shot 2018-04-09 at 6.23.06 PM

I’ve been using Spacemacs as by IDE/editor of choice for about half a year now. I absolutely love it! It’s basic premise is that

“The best editor is neither Emacs nor Vim, it’s Emacs and Vim!”.

In this post, I will try to explain briefly what it is and how I use it in a way that is newbie friendly.

Background

I’ve been programming for over 10 years now, professionally for about 4 years. I’ve worked in a variety of different languages over time – HTML/CSS/JS, C/C++, PHP, Python, Java and most recently Clojure.

Till now, I did not have a consistent IDE/Editor. For web development, I used to use Sublime and later Atom. For Python, PyCharm. For Java, I used to use Eclipse and switched to IntelliJ later (I still use IntelliJ + IdeaVIM for Java as I will explain later). I used to use basic Emacs in college for C/C++ projects and basic Vim at work whenever I needed to quickly make small changes to files when sshed into servers.

The original reason I made the switch to Spacemacs was because I couldn’t find a good idea for Clojure elsewhere. CounterClockwise, the Eclipse Clojure plugin is not being maintained actively. Cursive, the IntelliJ plugin is paid. The most recommended/actively developed/open-source IDE for Clojure Cider (Clojure(Script) Interactive Development Environment that Rocks) was available only for Emacs and its derivatives such as Spacemacs.

cider_repl
Clojure development using CIDER, complete with a fully featured REPL having auto-complete and doc-string tooltips.

However, now Spacemacs in Evil (Vim) mode has become my IDE/editor of choice for most languages and for any editing outside programming as well. Spacemacs makes it a lot easier to get to a really productive state without having to do a lot of research online, tuning your dotfiles, installing plugins etc – it comes with batteries included!

The Vim Modal editing

Spacemacs supports 3 modes of usage:

  1. Among the stars aboard the Evil flagship (vim)
  2. On the planet Emacs in the Holy control tower (emacs)
  3. A hybrid mode combining 1 and 2

If you are new to both Emacs and Vim, I strongly recommend using Spacemacs in Evil (Vim) mode. Emacs/Hybrid mode needs to be considered only if you are already comfortable with the Emacs keybindings.

Vim or its modal editing lets you edit text at the Speed of Thought. Many people argue that coding is more about thinking and less about churning out code, and that the minimal time savings you get from optimizations such as Vim is negligible. I disagree:

  1. Vim helps you make editing as least intrusive to your thinking as possible
  2. The small savings add up across your profession as a coder.
  3. As a programmer, having a hackable/fully-programmable editor gives you the flexibility to program your editor as per your needs.

Sylvain Benner, the creator of Spacemacs has written an excellent short piece on ‘Modal editing – why and how‘ that I highly recommend. In fact, I recommend going through that whole article (Beginner tutorial) if you are new to Spacemacs.

Vim itself has a steep initial learning curve. But it pays off greatly after the first few weeks if you write code as a profession. Some helpful/interesting Vim resources to get you started:

  1. Why Vim? – http://www.viemu.com/a-why-vi-vim.html
  2. Vim interactive tutorial – http://www.openvim.com/
  3. Vim adventures (interactive game) – https://vim-adventures.com/
  4. Vim Creep (story) – https://www.norfolkwinters.com/vim-creep/
  5. Vim cheatsheet – http://www.viemu.com/a_vi_vim_graphical_cheat_sheet_tutorial.html

There are lot of good video tutorials on Vim in Youtube too if that’s what you prefer. Here is one I liked – Learning Vim in a week (link).

In short, Vim lets you edit text similar to how you think about editing text. Examples:

  1. Typing ciw lets you change the current word (change in word)
  2. Typing di” lets you delete everything within two double-quotes (delete in “)
  3. Typing dt> lets you delete everything till > (delete till >)

You get the idea. These are executed in the normal mode, where you will spend most time. There is much to be said about the composability of vim commands too (some other time).

Pro tip: It is very highly recommended that you remap your Control (Ctrl) key to the Capslock key once you start using Vim. You won’t need to move your fingers from home row much with this change. For Mac, I used Karabiner to remap the keys.

Why Spacemacs?

Emacs,  “a great operating system, lacking only a decent editor” – http://en.wikipedia.org/wiki/Editor_war

At the same time, Vim is said to suffer from a less than ideal scripting language and window management capabilities. Spacemacs gives you the best of both worlds.

Two primary reasons which I went with Spacemacs were:

  1. Batteries included
  2. Mnemonics

Batteries included

This was the biggest factor for me. I’ve tried to transition into using Vim in the past. To get to a really productive dev environment, you need to spend significant time going through various forums, identifying the plugins you need, installing them using some package manager like Vundle etc. This can be quite daunting for a newcomer.

Spacemacs has this concept of layers. For example, each language has its own layer. When you install a layer, Spacemacs installs the best community-curated plugins for that language for you. Even better, it auto-detects which layer to install when you open a new type of file and asks you whether you want to install it. A simple ‘y’ and you’re set to go!

The layers that I have installed as of April 2018:

dotspacemacs-configuration-layers
 '(ruby
 php
 csv
 yaml
 clojure
 javascript
 python
 java
 c-c++
 markdown
 html
 ivy
 (auto-completion :variables
                   auto-completion-enable-help-tooltip t
                   auto-completion-enable-snippets-in-popup t)
 better-defaults
 emacs-lisp
 git
 markdown
 org
 (shell :variables
        shell-default-height 30
        shell-default-position 'bottom)
 ;; spell-checking
 syntax-checking
 version-control)

Mnemonics

Another issue in that I had faced in the past was learning and memorizing all the keyboard shortcuts. Vim/Emacs has evolved over more than 20 years now. Different plugins were created at different points of time. Hence there is no consistency or logical grouping for these.

Spacemacs solves that by taking all the plugins/shortcuts and logically grouping them by mnemonics. Spacebar is the leader key (main key for issuing any command). That itself is an improvement over other editors which strains your little fingers and can result in carpal tunnel syndrome (CTS).

To give some example of how mnemonics work:

  1. SPC w gives you all the window related commands.
    1. SPC w d deletes a window
    2. SPC w m maximizes a window
  2. SPC b gives you all buffer related commands.
  3. SPC g gives you all git related commands.

Spacemacs comes with the which-key plugin which shows a helpful menu whenever you press SPC so that you don’t need to memorize all the shortcuts.

spacemacs_which_keys
Spacemacs which-keys plugin helps find what you are looking for

If you are interested in learning more about Spacemacs, I highly recommend the Spacemacs ABC tutorial series by Eivind Fonn – Youtube playlist link. It is a series of 8 videos, in each of which Eivind explores one ore more keys under SPC in detail.

Some cool stuff

These are some of the cool features that made the whole experience more fun.

Magit

Magit is a plugin for git version control. Till now, I generally used to prefer command line git for all version control use-cases. But the 3 window Ediff merging capability of Magit is something that sold me completely. Resolving merge conflicts were never so much fun!

magit
3 window Ediff merging using Magit. Courtesy: Magit Ediff tutorial

Undo Tree

Shortcut: SPC a u

spacemacs_undo_tree
Undo tree lets you browse through all past changes

With most modern editors, we are used to a linear timeline of changes. If you go back to your past change and modify something, you effectively lose history of all the things you did after that change. With the undo tree, your entire history is preserved and you can easily navigate to any point in time including any branching in history you did.

Vim Macros

You can record and replay commands using Macros. A good introduction here – link. This can be pretty powerful if you need to do something repetitive in a smart way.

Shortcut to record: q<letter><commands>q
Shortcut to replay: <number>@<letter>

You might think these are pretty esoteric tools and of not much practical use. I too though so until I came across multiple situations where I was able to use them effectively.

Vimium (Bonus)

Once you get to using Vim, you will resent having to take your hands off the home row (where hjkl keys are situated) at any time. Moving your hands to the trackpad could feel like too much work.

Worry not, Vimium comes to the rescue – https://vimium.github.io/.
Its a plugin for Chrome (you have Vimari for Safari, not as feature-rich as Vimium) that lets you navigate the web using your keyboard, mostly with keys in the home-row.

vimium.png
Vimium lets you navigate the web using your keyboard

Being Pragmatic

Even though I’ve talked so much about Spacemacs, I have realized that it might just not be the right tool for some cases. In particular, for me, it was not working for Java development.

I tried all the options available in Spacemacs. Eclim, which comes with the Java layer, essentially runs a headless eclipse process in the backend. However, it is pretty slow. Other options like malabar, jdee or meghanada were either outdated, not being actively developed or not working. It’s difficult to write Java without proper auto-complete or go-to-definition etc. features.

I found that IntelliJ team offers an IdeaVim plugin that lets you use IntelliJ with Vim keybindings. This along with the distraction free mode (Cmd + I) gave me a near similar experience to Spacemacs, but with all the Java goodness.


I am still learning a lot in both Vim and Spacemacs. I hope to update this post with more interesting/useful features that I uncover over time.

Artificial Intelligence – Experience

I recently completed the Artificial Intelligence course (CS 6601) as part of OMSCS Fall 2017. The course gives an good overview of the different key areas within AI. Having taken Knowledge Based AI (CS 7637), AI for Robotics (CS 8803-001), Machine Learning (CS 7641) and Reinforcement Learning (CS 8803-003) before, I must say that the AI course syllabus had significant overlap in many areas with these courses (which is expected). However, I felt the course was still worthwhile since Prof. Thad taught these topics in his own perspective, which made me look at these topics in a different light. Prof. Thad also tried his best to make the course content interesting and humorous, which I really appreciated.

Course Outline

  1. Game Playing – Iterative Deepening, MinMax trees, Alpha Beta Pruning etc.
  2. Search – Uniform Cost Search, Bidirectional UCS, A*, Bidirectional A* etc.
  3. Simulated Annealing – Hill Climbing, Random restarts, Simulated Annealing, Genetic Algorithms etc.
  4. Constraint Satisfaction – Node, Arc and Path consistency, Backtracking, forward checking etc.
  5. Probability – Bayes Rule, Bayes Nets basics, Dependence etc.
  6. Bayes Nets – Conditional Independence, Cofounding cause, Explaining Away, D Separation, Gibbs Sampling, Monty Hall Problem etc.
  7. Machine Learning – kNN, Expectation Maximization, Decision Trees, Random forests, Boosting, Neural nets etc.
  8. Pattern Recognition through Time – Dynamic Time Warping, Sakoe Chiba bounds, Hidden Markov Models, Viterbi Trellis etc.
  9. Logic and Planning – Propositional Logic, Classic planning, Situation Calculus etc.
  10. Planning under Uncertainty – Markov Decision Processes (MDPs), Value iteration, Policy iteration, POMDPs etc.

The course used the classic textbook in AI –  Artificial Intelligence – A Modern Approach (3rd Edition) by Peter Norvig and Stuart Russell. Some chapters (such as Logic and Planning) was taught by Peter Norvig himself whereas few others were taught by Sebastian Thrun. There is no arguing that the course was taught by the industry best.

Screen Shot 2018-01-07 at 6.30.34 PM
The iconic cover of Artificial Intelligence: A Modern Approach

There were 6 assignments (almost one every alternate week) which required proper understanding of the course material and decent amount of coding (in Python). There was an open book midterm and final exam as well. Even though these were open book, these involved significant amount of work (researching and rereading the text, on paper calculations etc.). Overall, completing these forces one to really understand the concepts, which I really liked.

Summary Stats

  1. Average time spend per week – approx. 20 hours (including whole weekends on assignment due weeks)
  2. Difficulty (out of 5) – 4.25 (which is what I would rate ML too, and these two would top my list)
  3. Rating – 4/5

Introduction to Information Security – Experience

I did the Introduction to Information Security course (CS6035) as part OMSCS Summer 2017 semester.

The course was a good overview of various aspects of Information Security. It broadly covered topics like system security, network security, web security, cryptography, different types of malware etc. The course was lighter in terms of work load compared to the other subjects I’ve taken so far. I really liked the projects which were thoughtfully designed to give the students hands-on experience in each of these topics.

The four projects that we had to do were:

  1.  Implementing Buffer Overflow in a given vulnerable code. This required brushing up on C basics,  understanding how process memory allocation works internally and some playing around with gdb.
  2.  Analyzing provided malware samples using Cuckoo, an automatic malware analyzer and sandbox to identify behaviors such as registry updates, keyboard and mouse sniffing, remote access, privilege escalation etc.
  3. Understanding and implementing the RSA algorithm in python, identifying the weakness in using smaller length keys (64 bit) and decrypting an RSA encrypted message by exploiting this weakness.
  4. Exploiting vulnerabilities in a target (sample) website using Cross-Site Request Forgery (XSRF), Cross Site Scripting (XSS) and SQL injection.

Apart from the projects, there were 10 Quizzes to be completed, one per week throughout the course. The various exploits discussed in the course are fairly easy to be introduced in a codebase if you are not aware of these. Unfortunately, these are pretty common even now, many years after they were first discovered.

Hence, no matter the type of software development one is into (mobile, web, DB, relatively low level languages like C, embedded device programming, bare metal etc.), these exploits and their counter-measures are a must-know.

 

Machine Learning – Experience

I recently completed CS 7641 – Machine Learning as part of my OMSCS coursework. The course was really enjoyable and informative.

The course was taught by Professors Charles Isbell and Micheal Littman. Both are really awesome. Contrary to most other courses on the topic, they have managed to make the course content easy to understand and interesting, without losing out on any of its essences. All videos are structured as conversations between the Profs where one acts as the teacher and other as the student – very effective.

All the course videos are available publicly on Youtube – link. Also, I would recommend watching this funny Capella on ML based on Thriller by the Profs – link. 🙂

The course was a literature survey and general introduction into the various areas in ML. It was primarily divided into 3 modules:

  • Supervised learning – where we are given a dataset with labels (emails classified as spam or not). You try to predict the labels for future data based on what you’ve already seen or ‘learned’.
    • Techniques include Decision Trees, K-Nearest Neighbours, Support Vector Machines (SVM), Neural Networks etc
  • Unsupervised learning – all about finding patterns in unlabeled data. Eg: Group similar products together (clustering) based on customer interactions. This can be really helpful in recommendations etc.
    • Randomized Optimization, clustering, feature selection and transformation etc.
  • Reinforcement learning – the most exciting one (IMHO). This overlays many concepts we usually consider as part of Artificial Intelligence. RL is about incentivizing machines to learn various tasks (such as playing chess) by providing different rewards.
    • Markov Decision Processes, Game Theory etc.
    • I found the concepts in GT such as the Prisoners Dilemma, Nash Equilibrium etc. and how they tie into RL interesting.

All of these are very vast subjects in themselves. The assignments were designed in such a way that we got to work with all of these techniques at least to some extent. The languages and libraries that we use were left to our choice, though guidance and recommendations were provided. Through that, got the opportunity to work with Weka, scikit-learn and BURLAP.

Overall, enjoyed the course really well. Hoping to take courses like Reinforcement Learning (link) to learn more about the topics in upcoming semesters.

The Pragmatic Programmer

After having it on my to-do and wish list for about a year, I finally ordered and read ‘The Pragmatic Programmer‘. It was a really interesting read. I was able to relate to many of the chapters in it. The book talks about how programmers can rise from journeymen to masters.

The book contains many (70 to be precise) one line nuggets of programming wisdom. The authors themselves have made these available online here. Coding Horror (Jeff Atwood) also has a handy quick reference to many of the ideas mentioned in the book – link.

Even though the tips by themselves are great, I would recommend reading the whole book rather than reading them in isolation. What makes the book great is the way the authors presents the ideas in easy-to-understand ways, often using small stories and analogies wherever applicable. Some of the interesting ones below:

The Broken Window Theory (wiki):

Consider a building with a few broken windows. If the windows are not repaired, the tendency is for vandals to break a few more windows. Eventually, they may even break into the building, and if it’s unoccupied, perhaps become squatters or light fires inside.

This is how human psychology works. The same is applicable in terms of software quality. If we introduce entropy into the system (in the form of poor code, lack of unit or integration testing, poor review practices etc.), it will spread rapidly and destroy the system. The opposite can also happen where once we establish an immaculate system and great practices, individuals would try not to be the first to lower the standards.

The Stone Soup

The story can be read here. The authors have lessons from both sides of the story:

Tip: Be a Catalyst for Change

Like how the soldiers (or travellers as per the wiki) influenced and brought about change gradually, if we show people a glimpse of the future, they will be more willing to participate.

Tip: Remember the big picture

Villagers fall for the stone trick since they failed to notice gradual changes. This can happen to our software systems and projects as well. The next point is related.

The Boiled Frog

If a frog is put suddenly into boiling water, it will jump out, but if it is put in cold water which is then brought to a boil slowly, it will not perceive the danger and will be cooked to death.

The story is often used as a metaphor for the inability or unwillingness of people to react to or be aware of threats that rise gradually. Gradual increases in CPU/memory utilisation or service latencies which eventually bring down systems come into mind here. Gradual feature-creep and/or project delays which eventually add up to failed projects are also examples.

Some of the programming pearls of wisdom that I found most compelling were:

The Requirement Pit 

Requirements are often unclear and mixed with current policies and implementation. We must capture the underlying semantic invariants as requirements and document the specific or current work practices as policy.

Tip: Abstractions live longer than details

The Law of Demeter for Functions (wiki)

An object’s method should call only methods belonging to:

  • Itself
  • Any parameters passed in
  • Objects it creates
  • Component objects

Following this law helps us write ‘shy’ code which minimises coupling between modules.

Listing other tips below:

  • DRY principle – Don’t Repeat Yourself. Avoid duplication of code or documentation.
  • Orthogonality – Decouple systems into independent components.
  • Always use version control (even for documents, memos, scripts – for everything)
  • Use Domain Specific Languages (DSLs) and Code Generators to simply development
  • Ruthless testing – Test early, test often, test automatically
  • Use prototypes and tracer bullets wherever and whenever possible