Pedestrian Detector

Pedestrian Detector NN

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy.io import loadmat

First initialize the layers in the network. n1 is the number of nodes in the input layer. num_h is the number of nodes in the hidden layer.

n1=648
num_h=50

Create a an activation function for the nodes in the network. We will use the sigmoid function, which essentially maps values greater than 0.5 to 1 and those less than 0.5 to 0.

def sigmoid(x):
    return 1.0/(1.0+np.exp(-x))

Load the data. Since it is a .mat file, we will use scipy.io.loadmat to help.

ped=loadmat('Daimler_ped_data.mat')
no_ped=ped['c1'].transpose()
ped=ped['c2'].transpose()

It is always useful to plot some of the data examples after loading to make sure it was loaded in right. Here, one image from the pedestrian data and one from the non-pedestrian data are shown using matplotlib.

fig=plt.figure()
ax1=fig.add_subplot(121)
ax1.imshow(ped[0, :].reshape([18, 36]), cmap='gray')
ax2=fig.add_subplot(122)
ax2.imshow(no_ped[0, :].reshape([18, 36]), cmap='gray')
ped

Now create some training labels for the data and put both the pedestrian images and non-pedestrian images together into one dataset.

labels=np.zeros([ped.shape[0]+no_ped.shape[0], 1])
labels[:ped.shape[0], 0]=1
data=np.concatenate((ped, no_ped), 0)

Randomize the data and labels so the pedestrian and non-pedestrian images are mixed up.

rand=np.random.randint(0, data.shape[0], data.shape[0])
labels=labels[rand, :]
data=data[rand, :]

Scale each pixel value in the data so they are easier for the network to compare. It’s hard to compare the similarities and differences between a skyscraper and a two inch replica of one because one is much larger.

data=(data-np.mean(data, 0))/(np.std(data, 0)+1e-6)

Reserve 10 percent of the data for validating the training progress and the other 90 percent for actually training the network.

val_data=data[:(data.shape[0]*0.1), :]
val_labels=labels[:(data.shape[0]*0.1), :]
train_data=data[(data.shape[0]*0.1):, :]
train_labels=labels[(data.shape[0]*0.1):, :]

It’s good to check the shapes of the resulting matrices to make sure the validation and training data/labels were selected correctly.

print(train_data.shape)
print(train_labels.shape)
print(val_data.shape)
print(val_labels.shape)
(8820, 648)
(8820, 1)
(980, 648)
(980, 1)

Create the weights for the network. This network only has one hidden layer, so there will be two sets of weights: the weights between the input and hidden layer, and those between the hidden layer activations and the output node. A random initialization with values from a normal distribution will work fine for this. It usually helps to scale the values down so they’re really small, too.

w1=0.001*np.random.randn(n1, num_h)
w2=0.001*np.random.randn(num_h, 1)

Create a placholder matrix to store the derivatives in for each weight matrix.

dw1=np.zeros(w1.shape)
dw2=np.zeros(w2.shape)

Define the learning rate, which is kind of like how much each weight will change each iteration, and momentum. The learning rate is typically chosen as a small value, as it can actually make learning harder/impossible if it is too large.

learn_rate=0.0001
p=0.8

This is the main loop of the network. for 2000 iterations, we’ll send all of the training images through (with a matrix multiplication with the w1), add the biases for each node in the hidden layer, and then send through the sigmoid activation function. Then we’ll do that again to go from the hidden layer to the output layer. When we get the output, we’ll have one value for each training example telling us what the network thinks that example is (i.e. ped. or no ped.). Compare these values to the labels we made for each example, and then, using backpropagation, go backwards through the network and find how to modify every weight in the network to make the error go down. This is done by finding the partial derivative of each weight with respect to the error.

for iters in range(2001):
    z1=np.matmul(train_data, w1)+0.1*np.ones([train_data.shape[0], num_h])
    a1=sigmoid(z1)
    z2=np.matmul(a1, w2)+0.1*np.ones([train_data.shape[0], 1])
    out=sigmoid(z2)
    
    error=out-train_labels
    l2_loss=np.sqrt(np.sum(error**2))
    train_acc=np.mean(np.float32(np.round(out)==train_labels))
    
    d_w2=error*out*(1-out)
    d_w1=np.matmul(d_w2, w2.transpose())*a1*(1-a1)
    
    dw1=learn_rate*np.matmul(train_data.transpose(), d_w1)+p*dw1
    dw2=learn_rate*np.matmul(a1.transpose(), d_w2)+p*dw2
    
    w1=w1-dw1
    w2=w2-dw2
    
    if iters%50==0:
        testact1=sigmoid(np.matmul(val_data, w1)+0.1*np.ones([val_data.shape[0], num_h]))
        testout=sigmoid(np.matmul(testact1, w2)+0.1*np.ones([val_data.shape[0], 1]))
        val_acc=np.mean(np.float32(np.round(testout)==val_labels))
        print('Iteration: %d, Training Acc.: %.2f, Val. Acc.: %.2f, l2 loss: %.2f'%(iters, train_acc, val_acc, l2_loss))
    
Iteration: 0, Training Acc.: 0.49, Val. Acc.: 0.50, l2 loss: 47.06
Iteration: 50, Training Acc.: 0.86, Val. Acc.: 0.86, l2 loss: 30.15
Iteration: 100, Training Acc.: 0.88, Val. Acc.: 0.87, l2 loss: 28.18
Iteration: 150, Training Acc.: 0.89, Val. Acc.: 0.89, l2 loss: 26.69
Iteration: 200, Training Acc.: 0.91, Val. Acc.: 0.90, l2 loss: 24.55
Iteration: 250, Training Acc.: 0.93, Val. Acc.: 0.91, l2 loss: 22.10
Iteration: 300, Training Acc.: 0.95, Val. Acc.: 0.93, l2 loss: 19.90
Iteration: 350, Training Acc.: 0.96, Val. Acc.: 0.94, l2 loss: 18.05
Iteration: 400, Training Acc.: 0.97, Val. Acc.: 0.95, l2 loss: 16.52
Iteration: 450, Training Acc.: 0.98, Val. Acc.: 0.96, l2 loss: 15.22
Iteration: 500, Training Acc.: 0.98, Val. Acc.: 0.96, l2 loss: 14.10
Iteration: 550, Training Acc.: 0.98, Val. Acc.: 0.96, l2 loss: 13.12
Iteration: 600, Training Acc.: 0.99, Val. Acc.: 0.97, l2 loss: 12.27
Iteration: 650, Training Acc.: 0.99, Val. Acc.: 0.97, l2 loss: 11.52
Iteration: 700, Training Acc.: 0.99, Val. Acc.: 0.97, l2 loss: 10.85
Iteration: 750, Training Acc.: 0.99, Val. Acc.: 0.97, l2 loss: 10.25
Iteration: 800, Training Acc.: 0.99, Val. Acc.: 0.97, l2 loss: 9.71
Iteration: 850, Training Acc.: 0.99, Val. Acc.: 0.97, l2 loss: 9.22
Iteration: 900, Training Acc.: 0.99, Val. Acc.: 0.97, l2 loss: 8.76
Iteration: 950, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 8.35
Iteration: 1000, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 7.97
Iteration: 1050, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 7.63
Iteration: 1100, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 7.34
Iteration: 1150, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 7.08
Iteration: 1200, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 6.85
Iteration: 1250, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 6.64
Iteration: 1300, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 6.45
Iteration: 1350, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 6.28
Iteration: 1400, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 6.12
Iteration: 1450, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 5.97
Iteration: 1500, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 5.84
Iteration: 1550, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 5.71
Iteration: 1600, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 5.60
Iteration: 1650, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 5.49
Iteration: 1700, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 5.39
Iteration: 1750, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 5.30
Iteration: 1800, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 5.21
Iteration: 1850, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 5.13
Iteration: 1900, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 5.05
Iteration: 1950, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 4.98
Iteration: 2000, Training Acc.: 1.00, Val. Acc.: 0.98, l2 loss: 4.90

As you can see, we have built a relatively good and simple neural network that can detect pedestrians in photographs. The loss decreased, and training and validation accuracy increased as the training progressed. The network was also able to separate the validation images, which it is not trained on, based on whether they had a pedestrian or not with an accuracy of 98 percent.

We can also visualize the weights in each layer of the network. This is often useful, and it is possible to see what features were learned during training. Since each column in a weight matrix consists of the weights for a single node, we can pick out a single column and reshape the values into the size of the previous layer to see what that node was looking for in the previous layer. For w1, we can see what certain nodes were looking for in the input image. Let’s pull out a few random ones and plot them.

fig=plt.figure()
ax1=fig.add_subplot(131)
ax1.imshow(w1[:, 5].reshape([18, 36]))
ax2=fig.add_subplot(132)
ax2.imshow(w1[:, 20].reshape([18, 36]))
ax3=fig.add_subplot(133)
ax3.imshow(w1[:, 45].reshape([18, 36]))
w1

It is also possible to look at the activations of each layer, but this is usually more informative with convolutional neural networks when the data is an image.