There are times in our data science journey when we’re in the process of applying regression (whether linear or logistic) and we reach the crossroads of electing between model complexity and goodness of fit.
We’ve all been there. Where the feeling of discombobulation settles in and the questions rain down:
Founded on information theory…
It’s a classification algorithm. A means of grouping observations based on shared characteristics.
More specifically, in the case of logistic regression, we classify based on whether something does or does not happen. It’s black or white, with no room for grey. While explanatory variables may vary, the outcome variable must be binary (0/1, True/False, Yes/No, Pass/Fail, etc.).
Thus, when we classify our data, we consider outcome variables that tend toward extreme ends and then makes a logarithmic line to aid in distinguishing between them.
For a refresher on variable definitions CLICK HERE.
Note the tie between logistic and logarithmic. When…
A simple linear regression model is the easiest model to interpret.
Point. Blank. Period.
With this said, fitting data on a one-to-one variable basis limits our model’s representative ability and is not applicable in many cases.
Sometimes, to improve our model and better represent the data under consideration, we’ve got to incorporate more variables.
This leads, naturally, to the next step …
Multiple linear regression, or multi regression for short, doesn’t overcome all the weaknesses of a simple linear regression model, but it does expand our capability of representation.
Accounting for more independent variables enables us to better represent the…
When we say “elementary” linear regression, what we mean is that we’re taking the simplest principles, the building blocks of some idea (in this case linear regression) and exploring these building blocks in their absolute, simplest form.
For this reason, it’s more commonly referred to as simple linear regression.
Our application of linear regression is an attempt to not only explore but to quantify (put a number to) the relationship between a quantitative response variable and one or more explanatory variables.
Before moving forward, let’s quickly review a couple definitions: