Simple Linear Regression Models and the Math Behind Them.
A linear regression model establishes a linear relationship between the dependent and independent variables in a model. So what exactly do i mean by the dependent and independent variables?
Suppose you were establishing the relationship between the number of years of experience and the salary someone should earn. The graph would look something like this:
As you can see, the more experienced someone is, the more they earn(according to the chart above).
In this case, the salary will depend on the experience level. The salary is therefore the dependent variable(its depending on something) and the experience will be the independent variable.
Another way to think of it is: The dependent variable is what is being influenced/what we are trying to predict and the independent variable is what is influencing it.
Another question would be: What is the math behind this model?
Assuming you did some high school math, the equation of a linear graph is:
Y = MX + C
If i match that to the salary vs experience graph,
Y - Salary (Target/dependent variable)
M- The slope/coefficient of the graph. This will also tell us what kind of relationship to expect. If it is a positive number, then the dependent variable will increase as the independent value increases. If it is a negative value, then as the value of one increases, the value of the other decreases.
X- the experience (Independent variable)
C- The intercept value. This is where the line will cross the Y-Axis.
As you also can note, the line graph does not necessarily touch all the data points(in blue). The line generated, known as the linear regression line is known as the line of best fit.
In layman's terms, this is a straight line that gives the best approximation of the data set.
A rough way of estimating this would be to draw a straight line through as many points as possible, so that the number of data points above and below the line are somewhat equal.
This can however also be mathematically calculated using the least squared method. An explanation can be found here:
Now, suppose the task was to establish how much someone who has 3.5 years of experience should earn:
All you need to do is draw a straight line to meet the line and the corresponding salary value will be the salary value. This is the whole idea behind a linear model.
As an example I have created a repository for a beginner Linear Regression model on my GitHub. https://github.com/AbigaelN2021/AndelaLearningCommunity/tree/main/LinearRegression