The programming language R offers the following functions for fitting linear models:
1. lm – Used to fit linear models
This function uses the following syntax:
lm(formula, data, …)
where:
- formula: The formula for the linear model (e.g. y ~ x1 + x2)
- data: The name of the data frame that contains the data
2. glm – Used to fit generalized linear models
This function uses the following syntax:
glm(formula, family=gaussian, data, …)
where:
- formula: The formula for the linear model (e.g. y ~ x1 + x2)
- family: The statistical family to use to fit the model. Default is gaussian but other options include binomial, Gamma, and poisson among others.
- data: The name of the data frame that contains the data
Note that the only difference between these two functions is the family argument included in the glm() function.
If you use lm() or glm() to fit a linear regression model, they will produce the exact same results.
However, the glm() function can also be used to fit more complex models like:
- Logistic regression (family=binomial)
- Poisson regression (family=poisson)
The following examples show how to use the lm() function and glm() function in practice.
Example of Using the lm() Function
The following code shows how to fit a linear regression model using the lm() function:
#fit multiple linear regression model
model #view model summary
summary(model)
Call:
lm(formula = mpg ~ disp + hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.7945 -2.3036 -0.8246 1.8582 6.9363
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.735904 1.331566 23.083
Examples of Using the glm() Function
The following code shows how to fit the exact same linear regression model using the glm() function:
#fit multiple linear regression model
model #view model summary
summary(model)
Call:
glm(formula = mpg ~ disp + hp, data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.7945 -2.3036 -0.8246 1.8582 6.9363
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.735904 1.331566 23.083
Notice that the coefficient estimates and standard errors of the coefficient estimates are the exact same as those produced by the lm() function.
Note that we can also use the glm() function to fit a logistic regression model by specifying family=binomial as follows:
#fit logistic regression model
model #view model summary
summary(model)
Call:
glm(formula = am ~ disp + hp, family = binomial, data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9665 -0.3090 -0.0017 0.3934 1.3682
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.40342 1.36757 1.026 0.3048
disp -0.09518 0.04800 -1.983 0.0474 *
hp 0.12170 0.06777 1.796 0.0725 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.230 on 31 degrees of freedom
Residual deviance: 16.713 on 29 degrees of freedom
AIC: 22.713
Number of Fisher Scoring iterations: 8
We can also use the glm() function to fit a Poisson regression model by specifying family=poisson as follows:
#fit Poisson regression model
model #view model summary
summary(model)
Call:
glm(formula = am ~ disp + hp, family = poisson, data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.1266 -0.4629 -0.2453 0.1797 1.5428
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.214255 0.593463 0.361 0.71808
disp -0.018915 0.007072 -2.674 0.00749 **
hp 0.016522 0.007163 2.307 0.02107 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 23.420 on 31 degrees of freedom
Residual deviance: 10.526 on 29 degrees of freedom
AIC: 42.526
Number of Fisher Scoring iterations: 6
Additional Resources
How to Perform Simple Linear Regression in R
How to Perform Multiple Linear Regression in R
How to Use the predict function with glm in R