ALVINN Multilayer Perceptron

Today’s blog post is about self-driving cars, and also part history lesson. We will be using data from the first relatively successful autonomous vehicle ALVINN, which was pioneered by Dean Pomerleau from Carnegie Mellon University in 1989. ALVINN stands for Autonomous Land Vehicle in a Neural Network. It was simply amazing that they were able to do this at that time. Here is a video for those interested. We will use a subset of their data to predict the steering wheel angle from the input image.

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy.io import loadmat

We are working with a small amount of data here, so it is probably useful to try and get more images from this data, which is called augmentation. We define a function, add_noise that will essentially take our images and add random noise to each one, doubling the dataset. There are other ways to augment images like rotating them, flipping them, etc., but not many of those are possible with this data, because the orientation of the image matters for the steering wheel angle.

def add_noise(imgs, labels):
    new_imgs=imgs+0.15*np.random.randn(imgs.shape[0], imgs.shape[1], imgs.shape[2])
    return np.concatenate((imgs, new_imgs), 0), np.concatenate((labels, labels), 0)

Create a an activation function for the nodes in the network. We will use the sigmoid function, which essentially maps values greater than 0.5 to 1 and those less than 0.5 to 0.

def sigmoid(x):
    return 1.0/(1.0+np.exp(-x))

Load the data. Since it is a .mat file, we will use scipy.io.loadmat to help. The file is loaded in, and the names of the data variables within it can be seen by using the keys() function. I also wanted to show the last image in the data before and after augmentation to make sure the augmentation worked correctly.

data=loadmat('MPCR_ALVINN_Tensor.mat')
print(data.keys())
labels=data['cat']
data=data['pat']
a=data[data.shape[0]-10, :, :]
data, labels=add_noise(data, labels) # data augmentation
b=data[data.shape[0]-10, :, :]

fig=plt.figure()
ax1=fig.add_subplot(121)
ax1.imshow(a)
ax2=fig.add_subplot(122)
ax2.imshow(b)

print(labels.shape)
print(data.shape)
['cat', '__version__', 'pat', '__header__', '__globals__']
(194, 16)
(194, 227, 227)
alvinn

Randomize the data and labels.

rand=np.random.randint(0, data.shape[0], data.shape[0])
labels=labels[rand, :]
data=data.reshape([data.shape[0], -1])
data=data[rand, :]

Define the number of nodes in each layer. n1 is the input layer, n2 is the number of hidden nodes, and n3 is the number of output nodes.

n1=data.shape[1]
n2=50
n3=labels.shape[1]

Scale each pixel value in the data so they are easier for the network to compare. It’s hard to compare the similarities and differences between a skyscraper and a two inch replica of one because one is much larger.

data=(data-np.mean(data, 0))/(np.std(data, 0)+1e-6)

Reserve 10 percent of the data for validating the training progress and the other 90 percent for actually training the network. There are only 19 validation images, but, since the dataset is so small, we will have to work with that.

val_data=data[:(data.shape[0]*0.1), :]
val_labels=labels[:(data.shape[0]*0.1), :]
train_data=data[(data.shape[0]*0.1):, :]
train_labels=labels[(data.shape[0]*0.1):, :]

It’s good to check the shapes of the resulting matrices to make sure the validation and training data/labels were selected correctly.

print(train_data.shape)
print(train_labels.shape)
print(val_data.shape)
print(val_labels.shape)
(175, 51529)
(175, 16)
(19, 51529)
(19, 16)

Create the weights for the network. This network only has one hidden layer, so there will be two sets of weights: the weights between the input and hidden layer, and those between the hidden layer activations and the output node. A random initialization with values from a normal distribution will work fine for this. It usually helps to scale the values down so they’re really small, too.

w1=0.001*np.random.randn(n1, n2)
w2=0.001*np.random.randn(n2, n3)

Create a placholder matrix to store the derivatives in for each weight matrix.

dw1=np.zeros(w1.shape)
dw2=np.zeros(w2.shape)

Define the learning rate, which is kind of like how much each weight will change each iteration, and momentum. The learning rate is typically chosen as a small value, as it can actually make learning harder/impossible if it is too large.

learn_rate=0.0001
p=0.8

This is the main loop of the network. For 600 iterations, we’ll send all of the training images through (with a matrix multiplication with the w1), add the biases for each node in the hidden layer, and then send through the sigmoid activation function. Then we’ll do that again to go from the hidden layer to the output layer. When we get the output, we’ll have one value for each training example telling us what the network thinks that example is (i.e. ped. or no ped.). Compare these values to the labels we made for each example, and then, using backpropagation, go backwards through the network and find how to modify every weight in the network to make the error go down. This is done by finding the partial derivative of each weight with respect to the error.

for iters in range(600):
    z1=np.matmul(train_data, w1)+0.1*np.ones([train_data.shape[0], n2])
    a1=sigmoid(z1)
    z2=np.matmul(a1, w2)+0.1*np.ones([train_data.shape[0], n3])
    out=sigmoid(z2)
    
    error=out-train_labels
    l2_loss=np.sqrt(np.sum(error**2))
    train_acc=np.mean(np.float32(np.round(out)==train_labels))
    
    d_w2=error*out*(1-out)
    d_w1=np.matmul(d_w2, w2.transpose())*a1*(1-a1)
    
    dw1=learn_rate*np.matmul(train_data.transpose(), d_w1)+p*dw1
    dw2=learn_rate*np.matmul(a1.transpose(), d_w2)+p*dw2
    
    w1=w1-dw1
    w2=w2-dw2
    
    if iters%50==0:
        testact1=sigmoid(np.matmul(val_data, w1)+0.1*np.ones([val_data.shape[0], n2]))
        testout=sigmoid(np.matmul(testact1, w2)+0.1*np.ones([val_data.shape[0], n3]))
        val_acc=np.mean(np.float32(np.round(testout)==val_labels))
        print('Iteration: %d, Training Acc.: %.2f, Val. Acc.: %.2f, l2 loss: %.2f'%(iters, train_acc, val_acc, l2_loss))
    
Iteration: 0, Training Acc.: 0.06, Val. Acc.: 0.06, l2 loss: 27.63
Iteration: 50, Training Acc.: 0.94, Val. Acc.: 0.94, l2 loss: 12.64
Iteration: 100, Training Acc.: 0.94, Val. Acc.: 0.94, l2 loss: 12.20
Iteration: 150, Training Acc.: 0.94, Val. Acc.: 0.94, l2 loss: 11.55
Iteration: 200, Training Acc.: 0.95, Val. Acc.: 0.94, l2 loss: 10.86
Iteration: 250, Training Acc.: 0.96, Val. Acc.: 0.94, l2 loss: 10.24
Iteration: 300, Training Acc.: 0.96, Val. Acc.: 0.95, l2 loss: 9.68
Iteration: 350, Training Acc.: 0.98, Val. Acc.: 0.95, l2 loss: 9.16
Iteration: 400, Training Acc.: 0.98, Val. Acc.: 0.95, l2 loss: 8.64
Iteration: 450, Training Acc.: 0.98, Val. Acc.: 0.96, l2 loss: 8.19
Iteration: 500, Training Acc.: 0.98, Val. Acc.: 0.97, l2 loss: 7.79
Iteration: 550, Training Acc.: 0.98, Val. Acc.: 0.97, l2 loss: 7.48

We have built a relatively good and simple neural network that can determine the correct steering wheel angle when given an image of the road, even given the limited data we used. It could probably run a little longer and might get a higher accuracy, or a higher number of nodes in the hidden layer could help too. If this network was going to be used in a self-driving car, we would need a lot more data though to make it generalize to the different road conditions.

Let’s visualize the weights from some random hidden nodes to see what these nodes are looking for in the input.

fig=plt.figure()
ax1=fig.add_subplot(131)
ax1.imshow(w1[:, 0].reshape([227, 227]))
ax2=fig.add_subplot(132)
ax2.imshow(w1[:, 10].reshape([227, 227]))
ax3=fig.add_subplot(133)
ax3.imshow(w1[:, 44].reshape([227, 227]))
roadw

For tasks like this, a convolutional neural network would probably be better, as fully-connected networks do well on images, but they’re not the best. In a future blog post, we’ll apply CNNs to this data to see how it performs relative to the fully-connected data. Recurrent neural networks would probably do well on this type of task as well, and many groups working on autonomous driving are employing RNNs on driving data. These type of networks have a sort of memory and can be applied to any type of data where the data/output at one time step is predictive of the data/output at the next time step/next few time steps.