There are times in our data science journey when we’re in the process of applying regression (whether linear or logistic) and we reach the crossroads of electing between model complexity and goodness of fit.

We’ve all been there. Where the feeling of discombobulation settles in and the questions rain down:

  • how do we decide between models?
  • how do we decide which variables to keep in our model?
  • how do we toe the line between casting too simple vs. too complex a model?, and
  • how do we decide between whole models that may contain completely different variables?
Photo by on

Founded on information theory…

Photo by on

What is logistic regression?

It’s a classification algorithm. A means of grouping observations based on shared characteristics.

More specifically, in the case of logistic regression, we classify based on whether something does or does not happen. It’s black or white, with no room for grey. While explanatory variables may vary, the outcome variable must be binary (0/1, True/False, Yes/No, Pass/Fail, etc.).

Thus, when we classify our data, we consider outcome variables that tend toward extreme ends and then makes a logarithmic line to aid in distinguishing between them.

For a refresher on variable definitions .

Note the tie between logistic and logarithmic. When…

Photo by on

A simple linear regression model is the easiest model to interpret.

Point. Blank. Period.

With this said, fitting data on a one-to-one variable basis limits our model’s representative ability and is not applicable in many cases.

Sometimes, to improve our model and better represent the data under consideration, we’ve got to incorporate more variables.

This leads, naturally, to the next step …

Multiple linear regression, or multi regression for short, doesn’t overcome all the weaknesses of a simple linear regression model, but it does expand our capability of representation.

Accounting for more independent variables enables us to better represent the…

Photo by on

What is elementary linear regression?

When we say “elementary” linear regression, what we mean is that we’re taking the simplest principles, the building blocks of some idea (in this case linear regression) and exploring these building blocks in their absolute, simplest form.

For this reason, it’s more commonly referred to as simple linear regression.

Our application of linear regression is an attempt to not only explore but to quantify (put a number to) the relationship between a quantitative response variable and one or more explanatory variables.

Before moving forward, let’s quickly review a couple definitions:

  1. Qualitative variables are non-numerical and categorical (ie. eye color) whereas…

Magnus PS

Consultant | Master of Data Science candidate |“fulfilling lifestyle” writer @

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store