Understanding Artificial Neural Network using Python

By Sudhanshu Shukla January 09, 2020

Understanding Artificial Neural Network using Python

The Artificial Neural Network (ANN) is inspired by the animal brain. Even though currently the ANNs are not as powerful as the brain yet, these are one of the most powerful learning models in the field of machine learning.

Understanding the Human Brain and ANNs

Let's understand first how the human brain works and how it influences the ANNs.

The biological neuron receives signals through its dendrites which are either amplified or inhibited as they pass through the axons to the dendrites of other neurons.

ANNs are also based on a similar concept, where ANNs are a collection of a large number of simple devices called Artificial Neurons. The network learns to perform certain tasks ( like identifying a car ) by training the neurons to fire an action when a particular input ( like an image of a car) is provided.

Perceptron

Perceptron is one of the earlier proposed models, which takes a weighted sum of the inputs and applies an activation function to it. It is a kind of a single-layer artificial network with only one neuron.

The perceptron consists of 4 parts.

Input values
Weights and Bias
Net sum
Activation Function

Perceptron works as below-

All the inputs (x1,x2.....) are multiplied by their corresponding weights (w1,w2..) i.e. x1.w1, x2.w2.. etc.
Take the summation of all multiplied values which is called the weighted sum.
Then an activation function is applied to the weighted sum. We can use a step function as an activation function. A step function is defined as below -

Perceptron is usually used to classify the data into two parts. Therefore, it is also known as a Linear Binary Classifier. E.g. Classification of population into male and female.

Artificial Neuron

An artificial neuron is similar to a perceptron, except that the activation function is not a step function.

Below are some properties of the activation function.

It should be smooth and it should not have any abrupt changes.
They should also, to some extent, make the inputs and outputs non-linear to each other. This is because non-linearity contributes to compacting neural networks.

Below we have commonly used activation functions.

Logistic Function

Hyperbolic Tangent Function

Rectilinear Unit

Leaking and Parametric Relu

Artificial Neural Network

An artificial neural network (ANN) is a network of such neurons.

Neurons in a neural network are arranged in layers. The first and the last layer are called the input and output layers.

Input layers have as many neurons as the number of attributes in the data set.

For a classification problem, the output layer has as many neurons as the number of classes of the target variable .

For a regression problem, the number of neurons in the output layer would be 1.

Structure of ANN

Below are the parts of ANN:

Network Topology
Input Layer
Output Layer
Weights
Activation functions
Biases

The leftmost layer in this network is called the input layer (x), and the neurons within the layer are called input neurons. The rightmost or output layer (O) contains the output neurons. The middle layers are called hidden layer (h1,h2..hj) since the neurons in this layer are neither inputs nor outputs.

Given the complex nature of ANNs, we can assume below points:

Neurons are arranged in layers, sequentially.
Neurons within the same layer do not interact with each other.
Neurons are densely connected i.e. all neurons in layer n are connected to all neurons in layer n+1.
There is a weight associated with each interconnection in the neural network, and each neuron has a bias associated with it.
All neurons use the same activation function in a specific layer.

Feedforward in Neural Networks

In feedforward neural network , the output from one layer is used as input to the next layer.

In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.

The main goal of a feedforward network is to approximate some function f*. For example, a regression function y = f *(x) maps an input x to a value y. A feedforward network defines a mapping y = f (x; θ) and learns the value of the parameters θ that result in the best function approximation

The layers between the input layer and the output layers are known as hidden layers, as the training data for these layers do not show the desired output. With any number of hidden units, a network can contain any number of hidden layers. A unit is essentially like a neuron that takes input from previous layer units and calculates its own activation value.

Backpropagation in Neural Networks

Back-propagation is the essence of neural net training. It is the method of fine-tuning the weights of a neural net based on the error rate obtained in the previous epoch (i.e., iteration). Proper tuning of the weights allows you to reduce error rates and to make the model reliable by increasing its generalization.

Backpropagation is a short form of "backward propagation of errors." It is a standard method of training artificial neural networks. This method helps to calculate the gradient of a loss function with respects to all the weights in the network.

There is one important thing you should note here. We minimize the average of the total loss and the not the total loss. Minimizing the average loss implies that the total loss is getting minimized.

The loss function is defined as follows:

The loss function is defined in terms of the network output F(xi) and the ground truth yi. Since F(xi) depends on the weights and biases, the loss, in turn, is a function of (w, b) . The average loss across all data points is denoted by G(w, b) which we want to minimize.

Artificial Nerual Network using Python

Here we are using an MNIST dataset. Let's see how we can use python to create a neural network.

Loading and previewing the data.

Here the target variable has to be converted to a one-hot matrix. We use the function one-hot to convert the target dataset to one-hot encoding.

Following function will convert the data into the desired shape and also convert the ground truth labels to one_hot matrix

Now let us visualize the dataset. We are just checking random data here.

Activation Functions

Let's define some activation functions which we will be using in our code.

Sigmoid

Relu

Softmax

Initializing Parameters

We will use below function initialize_parameters which initializes the weights and biases of the various layers.

One way to initialize is to set all the parameters to 0. This is not considered a good strategy as all the neurons will behave the same way and it'll defeat the purpose of deep networks. Hence, we initialize the weights randomly to very small values but not zeros.

The biases are initialized to 0. Note that the initialize_parameters function initializes the parameters for all the layers in one for loop.

Feed Forward

Layer Forward

The function layer_forward implements the forward propagation for a certain layer 'l'. It calculates the cumulative input into the layer Z and uses it to calculate the output of the layer H. It takes H_prev, W, b and the activation function as inputs and stores the linear_memory, activation_memory in the variable memory which will be used later in backpropagation.

L_Layer_Forward

L_layer_forward performs one forward pass through the whole network for all the training samples (note that we are feeding all training examples in one single batch).
We will use the layer_forward which we have created above here to perform the feedforward for layers 1 to 'L-1' in the for loop with the activation relu.
The last layer having a different activation softmax is calculated outside the loop. Notice that the memory is appended to memories for all the layers. These will be used in the backward order during backpropagation.

Loss

The next step is to compute the loss function after every forward pass to keep checking whether it is decreasing with training.

compute_loss here calculates the cross-entropy loss.

Backpropagation

Now let's move to backpropagation.

Activation Functions

Sigmoid Backword

Relu Backward

layer_backward

layer_backward uses dH to calculate dW, dH_prev and db

L_layer_backward

L_layer_backward performs backpropagation for the whole network. The backpropagation for the last layer, i.e. the softmax layer, is different from the rest, hence it is outside the reversed for loop.

Parameter Updates

This step updates the weights and biases.

Model

The list dimensions has the number of neurons in each layer specified in it. For a neural network with 1 hidden layer with 45 neurons, we can specify the dimensions as follows

Now, let's call the function L_layer_model on the dataset we have created. Let's start with 2000 iterations.

Below method predicts and provides the accuracy

Let's see the accuracy we get on the training data.

We get an accuracy of around ~88% on training data. Let see how it performs on test data.
We can further improve the accuracy if we train it for a longer time.

Conclusion

We could see below how well the model performs and which of the images have been wrongly classified. As stated earlier, we can improve performance by changing the network architecture and training model for a longer time.

Output:

Correctly Labeled:

Wrongly Labeled:

Labeled as 9 instead of 5.

Labeled as 9 instead of 4.

Featured