top of page

LINEAR REGRESSION

The purpose of regression modeling is to use scatterplots to observe and interpret trends in data. Furthermore, to turn these trends into real life speculations in order to predict and draw conclusions for the future. Because scatterplots are composed of each and every data point, they are an accurate visual representation of patterns that could relate the two variables. For example, if a statistician observes a positive and strong correlation scatterplot, they would have evidence to predict that the relationship between the variables would continue to increase in the theoretical future.

For linear models, a good statistician will look for two factors in the scatterplot: direction and strength. Is it positive or negative? Is it strong or weak? Sometimes, these factors are unclear or ambiguous. Thus, here are the numerical mechanics to aid the visual estimation:

R Value

Once a statistician recognizes linear model, they can compute the correlation coefficient, or the “r value.” This is a numerical value between -1 and 1 that measures the strength and direction of a linear model. The sign of the coefficient determines the direction of the data; a positive coefficient indicates an increasing model, and vise versa. Next, the closer the number is to 1 or -1, the stronger the linear association is. Thus, a “r value” of 0 would indicate an extremely weak linear correlation, almost no correlation at all.

Regression Equation

Also, a statistician can produce a regression equation to estimate the model. Taking into account the strength and direction as the correlation coefficient does, the equation serves as an even more specific prediction for the future in terms of the two given variables.

 

As you probably know from regular math classes, not every data set is linear. However, one of the biggest mistakes with regression models is using strategies of linear interpretation when the data is non-linear. Thus, statisticians will “linearize the data.” This mathematical process will straighten and simplify the data so that it is more easily interpretable.

 

Therefore, if you observe a correlation is 0, you shouldn’t assume that there is a weak correlation. Yes, there’s a weak linear correlation, but when you look at graph you could find a very strong nonlinear correlation. Never count that out! 

 

In addition, many people commonly confused correlation with the slope of a regression line, which is not accurate; a steeper slope, whether positive or negative, of regression line does not indicate that it’s strongly correlated. Most importantly, regression models are just that: models. They are estimates that aim to mimic reality and the future, but like many areas of statistics, they’re not certain.

 

COMMON MISTAKES WITH REGRESSION

For more help with regression, visit: 

 

www.stattrek.com and follow the tutorial

 

bottom of page