The Basics of Linear Regression
This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about DataRobot, AI Platform, data science, and more.
Linear regression is a way to model the relationship between a response variable and one or more explanatory variables. In linear regression, the data is modeled by a linear function.
Package(s) needed: “scatterplot3d” (license: GPL-2)
Simple linear regression (only one explanatory variable)
“mtcars” is a built-in dataset of R that contains fuel consumption and other aspects of car design and performance for 32 cars.
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Fitting the dataset with a simple linear regression where the response variable is “mpg” and the explanatory variable is “wt” (weight) and printing out information about the simple linear regression model.
slm <- lm(mpg ~ wt, data = mtcars) summary(slm)
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.543 -2.365 -0.125 1.410 6.873 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.285 1.878 19.86 < 2e-16 *** ## wt -5.344 0.559 -9.56 1.3e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.05 on 30 degrees of freedom ## Multiple R-squared: 0.753, Adjusted R-squared: 0.745 ## F-statistic: 91.4 on 1 and 30 DF, p-value: 1.29e-10
Plotting a graph of “wt” vs. “mpg” and adding a line of best fit.
plot(x = mtcars$wt, y = mtcars$mpg, main = "Car Weight vs. Car MPG", xlab = "Weight", ylab = "MPG", col = "blue") abline(slm, col = "red")
Plotting a residuals vs. fitted graph with a line of best fit. The residuals are generally close to 0 except for a few outliers.
Plotting a normal Q-Q graph of the standardized residuals. The residuals are generally close to the diagonal line except for a few outliers. This suggests the errors are normally distributed, an assumption of linear regression.
Multiple linear regression (multiple explanatory variables)
Loading the library for 3D scatterplot.
Fitting the “mtcar” dataset with a multiple linear regression model, where the response variable is “mpg” and the explanatory variables are “wt” (weight) and “disp” (displacement), and then printing out information about the multiple linear regression model.
mlm <- lm(mpg ~ wt + disp, data = mtcars) summary(mlm)
## ## Call: ## lm(formula = mpg ~ wt + disp, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.409 -2.324 -0.768 1.772 6.348 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 34.96055 2.16454 16.15 4.9e-16 *** ## wt -3.35083 1.16413 -2.88 0.0074 ** ## disp -0.01772 0.00919 -1.93 0.0636 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.92 on 29 degrees of freedom ## Multiple R-squared: 0.781, Adjusted R-squared: 0.766 ## F-statistic: 51.7 on 2 and 29 DF, p-value: 2.74e-10
Plotting a graph of “wt” and “disp” vs. “mpg” that displays residual errors.
s3d <- scatterplot3d(x = mtcars$wt, y = mtcars$disp, z = mtcars$mpg, main = "Car Weight and Car Displacement vs. Car MPG with Residuals", xlab = "Weight", ylab = "Displacement", zlab = "MPG", color = "blue", pch = 20) s3d$plane3d(mlm, lty = "dotted") orig <- s3d$xyz.convert(mtcars$wt, mtcars$disp, mtcars$mpg) plane <- s3d$xyz.convert(mtcars$wt, mtcars$disp, fitted(mlm)) i.negpos <- 1 + (resid(mlm) > 0) segments(orig$x, orig$y, plane$x, plane$y, col = c("blue", "red")[i.negpos], lty = (2:1)[i.negpos])
Plotting 3D scatterplots with residuals
DataRobot Documentation portal (Regression Problems section)
We will contact you shortly
We’re almost there! These are the next steps:
- Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
- Click the confirmation link to approve your consent.
- Done! You have now opted to receive communications about DataRobot’s products and services.
Didn’t receive the email? Please make sure to check your spam or junk folders.
Accelerate Your AI Journey with the DataRobot Partner EcosystemMarch 28, 2023· 3 min read
How MLOps Enables Machine Learning Production at ScaleMarch 23, 2023· 4 min read
A New Era of Value-Driven AIMarch 16, 2023· 2 min read
As we see from countless examples, the demand for AI is at a fever pitch across every industry. Becoming AI-driven is no longer really optional. As AI continues to advance at such an aggressive pace, solutions built on machine learning are quickly becoming the new norm. To meet the demands of the modern world, we have to experiment fast, collaborate…
Actuaries have always been the jacks-of-all-trades. Mathematics and insurance knowledge form their professional foundation, but actuaries have also learned from other disciplines, such as law, accounting, marketing, and, of course, data science. But over the last decade or so, many actuaries have been finding it increasingly difficult to keep up with the rapidly developing field of data science. The good…
At DataRobot, we believe in 10x-ing our efforts: returns for our customers, professional development of our team, and our platform. Today's acquisition of Zepl does exactly that. It 10x’s our world-class AI Cloud platform by dramatically increasing the flexibility of DataRobot for data scientists who love to code and share their expertise across teams of all skill levels. Founded in…