There are times in our data science journey when we’re in the process of applying regression (whether linear or logistic) and we reach the crossroads of electing *between* model complexity and goodness of fit.

We’ve all been there. Where the feeling of discombobulation settles in and the questions rain down:

- how do we decide between models?
- how do we decide which variables to keep in our model?
- how do we toe the line between casting too simple vs. too complex a model?, and
- how do we decide between whole models that may contain completely different variables?

Founded on information theory…

It’s a classification algorithm. A means of grouping observations based on shared characteristics.

More specifically, in the case of logistic regression, we classify based on whether something does or does not happen. It’s black or white, with no room for grey. While explanatory variables may vary, the outcome variable must be binary (0/1, True/False, Yes/No, Pass/Fail, etc.).

Thus, when we classify our data, we consider outcome variables that tend toward extreme ends and then makes a logarithmi*c* line to aid in distinguishing between them.

For a refresher on variable definitions CLICK HERE.

Note the tie between logistic and logarithmic. When…

A simple linear regression model is the easiest model to interpret.

Point. Blank. Period.

With this said, fitting data on a one-to-one variable basis limits our model’s representative ability and is not applicable in many cases.

Sometimes, to improve our model and better represent the data under consideration, we’ve got to incorporate more variables.

This leads, naturally, to the next step …

**Multiple linear regression**, or multi regression for short, doesn’t overcome all the weaknesses of a simple linear regression model, but it does expand our capability of representation.

Accounting for more independent variables enables us to better represent the…

When we say “elementary” linear regression, what we mean is that we’re taking the simplest principles, the building blocks of some idea (in this case linear regression) and exploring these building blocks in their absolute, simplest form.

For this reason, it’s more commonly referred to as *simple* linear regression.

Our application of linear regression is an attempt to not only explore but to quantify (put a number to) the relationship between a quantitative response variable and one or more explanatory variables.

Before moving forward, let’s quickly review a couple definitions:

*Qualitative*variables are non-numerical and categorical (ie. eye color) whereas…

Consultant | Master of Data Science candidate |“fulfilling lifestyle” writer @ http://www.magnusskonberg.com