# The Basics of Linear Regression

October 11, 2019
by

This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about DataRobot, AI Platform, data science, and more.

Linear regression is a way to model the relationship between a response variable and one or more explanatory variables. In linear regression, the data is modeled by a linear function.

## Simple linear regression (only one explanatory variable)

“mtcars” is a built-in dataset of R that contains fuel consumption and other aspects of car design and performance for 32 cars.

data(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Fitting the dataset with a simple linear regression where the response variable is “mpg” and the explanatory variable is “wt” (weight) and printing out information about the simple linear regression model.

slm <- lm(mpg ~ wt, data = mtcars)
summary(slm)
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
##    Min     1Q Median     3Q    Max
## -4.543 -2.365 -0.125  1.410  6.873
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   37.285      1.878   19.86  < 2e-16 ***
## wt            -5.344      0.559   -9.56  1.3e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.05 on 30 degrees of freedom
## Multiple R-squared: 0.753,   Adjusted R-squared: 0.745
## F-statistic: 91.4 on 1 and 30 DF,  p-value: 1.29e-10

Plotting a graph of “wt” vs. “mpg” and adding a line of best fit.

plot(x = mtcars\$wt, y = mtcars\$mpg, main = "Car Weight vs. Car MPG", xlab = "Weight", ylab = "MPG", col = "blue")
abline(slm, col = "red")

Plotting a residuals vs. fitted graph with a line of best fit. The residuals are generally close to 0 except for a few outliers.

plot(slm, 1)

Plotting a normal Q-Q graph of the standardized residuals. The residuals are generally close to the diagonal line except for a few outliers. This suggests the errors are normally distributed, an assumption of linear regression.

plot(slm, 2)

## Multiple linear regression (multiple explanatory variables)

library(scatterplot3d)

Fitting the “mtcar” dataset with a multiple linear regression model, where the response variable is “mpg” and the explanatory variables are “wt” (weight) and “disp” (displacement), and then printing out information about the multiple linear regression model.

mlm <- lm(mpg ~ wt + disp, data = mtcars)
summary(mlm)
##
## Call:
## lm(formula = mpg ~ wt + disp, data = mtcars)
##
## Residuals:
##    Min     1Q Median     3Q    Max
## -3.409 -2.324 -0.768  1.772  6.348
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.96055    2.16454   16.15  4.9e-16 ***
## wt          -3.35083    1.16413   -2.88   0.0074 **
## disp        -0.01772    0.00919   -1.93   0.0636 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.92 on 29 degrees of freedom
## Multiple R-squared: 0.781,   Adjusted R-squared: 0.766
## F-statistic: 51.7 on 2 and 29 DF,  p-value: 2.74e-10

Plotting a graph of “wt” and “disp” vs. “mpg” that displays residual errors.

s3d <- scatterplot3d(x = mtcars\$wt, y = mtcars\$disp, z = mtcars\$mpg, main = "Car Weight and Car Displacement vs. Car MPG with Residuals", xlab = "Weight", ylab = "Displacement", zlab = "MPG", color = "blue", pch = 20)
s3d\$plane3d(mlm, lty = "dotted")
orig <- s3d\$xyz.convert(mtcars\$wt, mtcars\$disp, mtcars\$mpg)
plane <- s3d\$xyz.convert(mtcars\$wt, mtcars\$disp, fitted(mlm))
i.negpos <- 1 + (resid(mlm) > 0)
segments(orig\$x, orig\$y, plane\$x, plane\$y, col = c("blue", "red")[i.negpos], lty = (2:1)[i.negpos])

## Related Materials

“scatterplot3d” function

Linear regression

Plotting 3D scatterplots with residuals

DataRobot Documentation portal (Regression Problems section)

DEMO
See DataRobot in Action