ML FUNDAMENTALS:MULTIPLE LINEAR REGRESSION EXPLAINED.

Arjun S
5 min readJan 21, 2021

To get a better insight on multiple linear regression ,let me give you a quick walk through Simple linear regression.

SIMPLE LINEAR REGRESSION.

In simple linear regression we only had a single feature and an corresponding result. To be precise there was only one independent and one dependent variable. That is there was only a single input and the relationship between the input and output was one to one. And our hypothesis was represented by

where the b0 is Y-intercept , b1 is the slope ,X is the feature or independent variable and Y is the predicted value or dependent variable. The parameters b0 and b1 are estimated through regression (various algorithms like gradient descent can used to calculate it and the suitable parameters will best fit the line in the dataset)

Yeah!!!This is the equation for a straight line. Hypothesis for training data is a straight line in case for simple or univariate linear regression.

Lets look at an example how simple linear regression helped in solving a real life problem.

Let me give you an example to get a clear picture about this. Suppose consider that you plan on going on a trip along with your friends .It’s a 12 hour trip and have 1500 miles to cover. You have been in charged of calculating all the expenses of the vehicle during the trip .Suppose you only consider the gas charge for the vehicle as the expense. You want to know how money much does it take for your car for a 1500 miles journey. How? Luckily you have been noting your cars efficiency for last 6 months. And you have a data depicting the cars total distance covered vs the total cash payed for the gas for the corresponding distance covered. This was plotted in a graph . Surprisingly the data tends to show a linear fashion. That is, as the distance travelled increases the gas charge also increases. From this data we can now predict the gas required for the 1500 mile drive. Just fit a hypothesis or straight line in the data in the best manner and from that hypothesis you can now predict the gas price. Point to note is that we have only considered the gas price as the expense for the vehicle during the trip here( that is only a single feature is considered to predict the total expense of trip).

For more understanding about simple linear regression, you can checkout my previous post, simple linear regression.

MULTIPLE LINEAR REGRESSION

OK, now let me give you an understanding about multi linear regression. Consider the above example itself. In that example we have only considered the gas price as the total expense of the vehicle during the trip. But do you think a vehicle will only be having this expense for long trips like this. Consider that the vehicle tyres was changed at frequent intervals ,and also the wheel alignment was fixed at certain intervals .Adding these features to the dataset(tyre changing and alignment fixing of wheel)helps to get a more accurate solution than just adding one feature(the gas price only). Consider that we have been noting all these expenses in our computer for last 2 years. From this we can derive our dataset.

This takes us to multilinear regression.

Linear regression with multiple variables (features or inputs) is known as multiple linear regression or multivariate linear regression.

The main advantage of multilinear regression over simple is that it establishes a deeper connection between the inputs and outputs which in result helps in a better accurate prediction.

Now you know that there may be 2 or more inputs variables for this regression. And therefore to represent the features let me give you some notations.

Unlike simple linear regression it is difficult to represent the graph depicting the predictor(input) and response(output) because here the inputs are 2 or more. As the feature number increases the dimension in which it should be presented also increases. For example a 2 feature training example can be depicted in a 3-dimensional graph only. For N features ,N+1 dimensions are required to depict the graph.

Let me give you an example. The plot for 2 features will have 3-d view like this,

The multivariable form of the hypothesis function accommodating these multiple features is as follows:

​(x)=θ0​+θ1​x1​+θ2​x2​+θ3​x3​+⋯+θnxn​.

where x1,x2,x3 ….,xn are the input features and θ0 ,θ1, θ2 ,θ3 are the corresponding parameters.

If there was only ​x1 variable or single input ,the hypothesis will look like this — ​(x)=θ0​+θ1​x1. This equation simply depicts a straight line and therefore it is the hypothesis function for simple linear regression.

But as the number of inputs increases the hypothesis also needed to expand to different dimensions.

Wait !!!! But how does our example looks like now???

Our new equation for the hypothesis is ​(x)=θ0​+θ1​x1​+θ2​x2​+θ2​x3​

where x1 is the total distance travelled for a specific amount of gas ,x2​ is the total distance travelled by a tyre before replacing or simply the lifespan of tyre ,x3​ is the total distance travelled before alignment fixing of wheel.

θ0 ,θ1, θ2 ,θ3 are the parameters or coefficients .

​(x) gives the total expense of travel (predicted). This might vary slightly from the real expense we have given in the dataset. Our main aim is to make this variation in predicted value and real value (error)as low as possible.

Consider a sample data in the dataset. Suppose if your vehicle travelled 300 miles for a specific amount of gas ,1500 miles before the tyre was replaced and 300 miles before the wheel alignment was fixed. Then our hypothesis looks like this now ,

​(x)=θ0​+θ1.300​+θ2.1500​+θ2​.300​

Using the definition of matrix multiplication, our multivariable hypothesis function can be concisely represented as:

This is a vectorization of our hypothesis function for one training example

The above given equation is the same, hypothesis equation we discussed earlier. That is

​(x)=θ0​+θ1​x1​+θ2​x2​+θ3​x3​+⋯+θnxn​.

For more than one training example ,You can calculate the hypothesis as a column vector of size (m x 1) with:

(There are ’m’ training examples and each training example has ‘n’ features)

More stuffs like feature scaling, cost function and gradient descent will be discussed in the future articles.

REFERENCES:

Machine Learning: Coursera

Medium articles.

--

--