We know that the whole idea of artificial neural networks is originated from the functioning of the best processor in the world ,that is “the brain”. Through this we are trying to build artificial neurons similar to the biological neurons in our brain and imitate it’s functions. In this article I’m going to explain what actually artificial networks are? What are it’s components? Explaining the layer concept etc.

Moreover to get a better idea I’m going to explain these concepts using the classic example of handwritten digit recognition- the “hello world” of neural networking.

Before getting into artificial ,let’s take a quick look into biological neuron.


First and foremost, let me give you the basic structure of neuron in brain.


Neurons are the functional units of a nervous system. Let me give the basic function of a neuron-

  1. Receive the signals or information from the outside world through dendrites.(For example we are looking onto something, it creates impulses in the visual cortex of our brain and these impulses are transferred to the different parts of our body through the neurons in order to respond to the vision)
  2. Do some processing or computations in the incoming signal and determine whether it should be passed or not to the next neuron ,cells or muscles. This function takes place in the cell body.
  3. The processed information from the cell body is passed to the next neuron through this output called Axons.

The next neuron receives this impulse through its input dendrite. And this process continues through millions of neurons in our body until the output is produced.


All impulses will not get passed from one neuron to another. It depends on some factors. For example, a neuron receives thousands of impulses through its dendrites, these impulses can be either inhibitory(the neurons will fire the electrical impulse to the next neuron) or excitatory(No impulse will be generated to fire). The impulse is generated based on the sum of all these inhibitory and excitatory impulses and this process takes place in the cell body.

Okay. Now I think you got an idea about the function of neurons in brain. Now let me jump into artificial neuron.



Mmmm…Let me explain why?

Actually for complex non-linear hypothesis it is better to use neural networks. But what does that mean??

Consider an example of computer vision. Say ,I want to classify images of cats from a given set of images. If I took the traditional approach of machine learning logistic classification for this purpose, I would end end up with a hell lot of features. Like there will be millions of features .Because the computer see the image as pixels with different brightness or intensity. That is if I gave an image of cat as a training example, say the image is of 50x50 pixels, our dimension of the feature space(number of original features) will be an approximate 2500 pixels. This results in our hypothesis to contain millions of features . It is computationally too expensive to find all the parameters of these millions of features for each training example.

How computer sees

Our hypothesis may look similar to below containing a lot of features

logistic regression

Therefore this is just not a good way to learn complex nonlinear hypotheses when number of original features is large because you just end up with too many new features.

And for most of the real world machine learning problem there will be a lot of features to consider. Therefore we need a different way to tackle this issue. Neural networks is a best way to fix this.


Artificial neurons functions similar to biological neuron we discussed earlier. An example of artificial neuron is perceptron. But in this article and most of the modern works a slightly modified version of perceptron called sigmoid neurons are used. I will discuss the difference in later articles.

An artificial neural network consists of one input layer, one output layer and sometimes hidden layer. If there are no hidden layers it means ,it’s a linear classification problem. As the number of hidden layers increases, it means the hypothesis will be of non-linear in nature. Take a quick look at the function happening in each of the neurons in the neural networks(Further explanation will be detailed later in this article). It is called Activation function.

Wait!!! Let’s crack this concept with a classic example. Further explanation of this concept is given through the example of handwritten digit recognition below.


As the name suggests neural networks are inspired by our brain.

But lets break that down!!!!!

What exactly are neurons and in what sense are they linked together???

For now just think it as something that holds a number. Specifically a number between 0 and 1 , its nothing more than that. Here I took a value between 0 and 1 because it is the gray scale value of the pixel density. 0 represents black color(total absence of intensity) and 1 represents white(Total presence of intensity). Actually this squishing of values between 0 and 1 are motivated by the biological analogy of neurons ,either ‘active’ or ‘inactive’

For example the networks starts with a bunch of neurons corresponding to each of the 28 x 28 pixels of the input image which is 784 neurons in total. Each one of these holds a number that represents the grayscale value of the corresponding pixel ranging from 0 for black pixel and 1 for white pixel.

This number inside the neuron is called its activation and the image you might have in mind here is that each neuron is lit up when its activation is a high number that is close to 1 or greater than a certain threshold limit.

So all of these 784 neurons make up the first layer of our network.


All those 784 neurons in the image are spread out in a line to form the first input layer. Take a look below.

For simplicity I’ve omitted most of the 784 input neurons in the diagram above.

Now jumping over to the last layer this has ten neurons each representing one of the digits (we want to classify digits from 0 to 9).

The activation in these neurons again some number that’s between zero and one represents how much the system thinks that a given image corresponds with a given digit. The brightest neuron will give the result.


The output layer of the network contains 10 neurons. If the first neuron fires, i.e., has an output≈1, then that will indicate that the network thinks the digit is a 0. If the second neuron fires then that will indicate that the network thinks the digit is a 1. And so on. A little more precisely, we number the output neurons from 0 through 9, and figure out which neuron has the highest activation value. If that neuron is, say, neuron number 6, then our network will guess that the input digit was a 6. And so on for the other output neurons.

There’s also a couple layers in between called the hidden layers which for the time being should just be a giant question mark for how on earth this process of recognizing digits is going to be handled. In this network I chose two hidden layers each one with 16 neurons and admittedly that’s kind of an arbitrary choice. To be honest I chose those two layers to get a good understanding about hidden layers but why 16? Well that was just a number to fit on the screen in practice.

The way the network operates activations in one layer determine the activations of the next layer.

And of course the heart of the network as an information processing mechanism comes down to exactly how those activations from one layer bring about activations in the next layer. It’s similar to how in biological networks of neurons some groups of neurons firing cause certain others to fire.

Now the network I shown you here has already been trained to recognize digits and let me show you what I mean by that.

It means if you feed in an image lighting up all 784 neurons of the input layer according to the brightness of each pixel in the image, that pattern of activations causes some very specific pattern in the next layer, which causes some patterns in the one after it, which finally gives some pattern in the output layer and the brightest neuron of that output layer is the network’s choice ,that is what digit the image represents.

And before jumping into math for how one layer influences the next or how training works, Let’s just talk about why it’s even reasonable to expect a layered structure like this to behave intelligently.


What are we expecting here? What is the best hope for what those middle layers might be doing.

Well when you or I recognize digits we piece together various components . For example a ‘9’ has a loop up top and a line on the right. Similarly an 8 has a loop up top, but it’s paired with another loop down below. A ‘4’ basically breaks down into three specific lines and things like that.

Now in perfect world we might hope that each neuron in the second to last layer corresponds with one of these sub components. That anytime you feed in an image with say a loop up top like a 9 or an 8, there’s some specific neuron whose activation is going to be close to one. The hope would be that any general loopy pattern towards the top, sets off this neuron .That way going from the third layer to the last one just requires learning which combination of sub components corresponds to which digits. Look at the images below

But how would you recognize these sub components ??? Or even learn what the right sub components should be and I still haven’t even talked about how one layer influences the next but run with me on this one for a moment.


Recognizing a loop can also break down into sub problems. One reasonable way to do this would be to first recognize the various little edges that make it up. Similar to zero, a long line like the kind you might see in the digits 1 or 4 or 7. Well you can subdivide that long line further into small lines. Look below

So maybe our hope ( Actually we are assuming the work like this. Important thing is, to train the network you just need to give inputs and labels, it will figure out all the necessary features to be used by itself. This means rather than saying 1st hidden neuron detects circular shape, 2nd detects horizontal line, 3rd detect vertical line we let neural network detect whatever features it gonna need for making prediction. I’m assuming this way for the sake of understanding only.) is that each neuron in the second layer of the network corresponds with the various relevant little edges. May be when a image like this one (9) comes in it lit up all of the neurons associated with around eight to ten specific little edges as shown below.

which in turn lights up the neurons associated with the upper loop (because 9 has a loop at top)and a long vertical line and those lit up the neuron associated with a nine.


Whether or not this is what our final network actually does is another question ,I will come back to once we see how to train the network. But this is the hope we might have. A sort of goal with the layered structure like this. Moreover you can imagine how being able to detect edges and patterns like this would really useful for other image recognition tasks and even beyond image recognition there are all sorts of intelligent things you might want to do that break down into layers of abstraction. Parsing speech for example involves taking raw audio and picking out distinct sounds which combine to make certain syllables which combine to form words which combine to make up phrases and more abstract thoughts etc.

But getting back to how any of this actually works picture yourself right now designing , how exactly the activations in on layer determine the activations in the next. The goal is to have some mechanism that could conceivably combine pixels to edges or edges into patterns or patterns into digits

Consider a specific example, lets say there is one particular neuron in the second layer that checks whether or not the image has an edge in this region here( denoted in white space below). That particular neuron is assigned with a particular function for recognizing that edge. This function can be done with the help of certain parameters or weights.

Therefore the question at hand should what parameters should the network have ? Well what we will do is assign a weight to each one of the connection between our neuron and the neurons from the first layer. These weights are just numbers(These parameters can be found out through algorithms like gradient descent).

Then take all those activations from the first layer and compute their weighed sum according to these weights.

Now if we made the weights associated with almost all of the pixels zero(black) except for some positive weights(weights overlapping with the image of ‘7’ given below) in this region that we care about,

then taking the weighted sum of all the pixel values really just amounts to adding up the values of the pixel just in the region we care about, that is the small portion of region we considered. Look at the figure given below

We are only considering the weights and pixels of that small rectangular spaced region shown in the above figure for that particular neuron in the second layer. In this way other neurons in the second layer consider different remaining edges of the digit ‘7’ .The neurons having these edges will fire up to next layer of neurons (the 3rd layer). This way the third layer also detect different patterns in that layer and finally it outputs the digit.

Point to note

Not all neurons “fire” all the time. Each neuron receives inputs from the neurons to its left, and the inputs are multiplied by the weights of the connections they travel along. Every neuron adds up all the inputs it receives in this way and (this is the simplest neural network) if the sum is more than a certain threshold value, the neuron “fires” and triggers the neurons it’s connected to (the neurons on its right).

When you compute weighted sum like these you might come across any numbers from -∞ to +∞, but for this network what we want is for activations to be some value between 0 and 1.

So a common thing to do is to pump this weighed sum into some function that squishes the real number line into the range between 0 and 1. And a common function that does this is called the sigmoid function .Also known as logistic curve. Basically very negative inputs end up close to zero and very positive inputs end up close to 1 and it just steadily increase around the input zero.

So the activation of neuron here is basically a measure of how positive the relevant weights sum is .But maybe it’s not that you want the neuron to lit up when the weighed sum is bigger than 0, maybe you only want it to be active when sum is bigger than say 10, that is you want some bias for it to be inactive. What we will do then is just add in some there numbers like -10 to this weighed sum before plugging it through the sigmoid function. That additional number is called bias.

So the weights tell you what pixel pattern(different patterns of edges) this neuron in the second layer is picking up on and the bias tells you how high the weighed sum needs to be before the neuron starts getting meaningfully active(to fire). And that is just one neuron. Every other neuron in this layer is going to be connected to all 784 pixels neurons from the first layer and each of those 784 connections has it’s own weight associated with it.

If you have network with multiple hidden layers 1st hidden layer will detect some components this will be used by 2nd hidden layer to detect some other sub components this will in turn be used my 3rd hidden layer in this way up to second last layer and last layer will make predictions.


The sigmoid actually became an old school approach now. Instead of sigmoid Rectified linear unit(ReLU) is taken nowadays because it is much easier to train. Detailed explanation will be given in the later articles.

Okay. That’s for now. I will be writing more articles in this topic in future. Do give a clap if you like this article.

This article is mainly inspired by the neural networks playlist of the 3Blue1Brown Youtube channel. They really helped to crack this hard concept. Checkout those videos if you haven’t. And for those who want to learn more, I highly recommend the book by Michael Nielsen introducing neural networks and deep learning: http://neuralnetworksanddeeplearning.com/

Machine learning enthusiasist