*38*

**Multiple linear regressionÂ **isÂ a method we can use to understand the relationship between two or more explanatory variables and a response variable.

This tutorial explains how to perform multiple linear regression in SPSS.

**Example: Multiple Linear Regression in SPSS**

Suppose we want to know if the number of hours spent studying and the number of prep exams taken affects the score that a student receives on a certain exam. To explore this, we can perform multiple linear regression using the following variables:

**Explanatory variables:**

- Hours studied
- Prep exams taken

**Response variable:**

- Exam score

Use the following steps to perform this multiple linear regression in SPSS.

**Step 1: Enter the data.**

Enter the following data for the number of hours studied, prep exams taken, and exam score received for 20 students:

**Step 2: Perform multiple linear regression.**

Click theÂ **AnalyzeÂ **tab, thenÂ **Regression**, thenÂ **Linear**:

Drag the variableÂ **scoreÂ **into the box labelled Dependent. Drag the variablesÂ **hours** andÂ **prep_exams** into the box labelled Independent(s). Then clickÂ **OK**.

**Step 3: Interpret the output.**

Once you clickÂ **OK**, the results of the multiple linear regression will appear in a new window.

The first table weâ€™re interested in is titledÂ **Model Summary**:

Here is how to interpret the most relevant numbers in this table:

**R Square:Â**This is the proportion of the variance in the response variable that can be explained by the explanatory variables. In this example,Â**73.4%**Â of the variation in exam scores can be explained by hours studied and number of prep exams taken.**Std. Error of the Estimate:Â**TheÂ standard errorÂ is the average distance that the observed values fall from the regression line. In this example,Â the observed values fall an average ofÂ**5.3657**Â units from the regression line.

The next table weâ€™re interested in is titledÂ **ANOVA**:

Here is how to interpret the most relevant numbers in this table:

**F:Â**This is the overall F statistic for the regression model, calculated as Mean Square Regression / Mean Square Residual.**Sig:Â**This is the p-value associated with the overall F statistic. It tells us whether or not the regression model as a whole is statistically significant. In other words, it tells us if the two explanatory variables combined have a statistically significant association with the response variable. In this case the p-value is equal to 0.000, which indicates that the explanatory variablesÂ hours studiedÂ andÂ prep exams takenÂ have a statistically significant association withÂ exam score.

The next table weâ€™re interested in is titledÂ **Coefficients**:

Here is how to interpret the most relevant numbers in this table:

**Unstandardized B (Constant):Â**This tells us the average value of the response variable when both predictor variables are zero. In this example, the average exam score isÂ**67.674Â**when hours studied and prep exams taken are both equal to zero.**Unstandardized B (hours):Â**This tells us the average change in exam score associated with a one unit increase in hours studied, assuming number of prep exams taken is held constant. In this case, each additional hour spent studying is associated with an increase of**5.556**points in exam score, assuming the number of prep exams taken is held constant.**Unstandardized B (prep_exams):Â**This tells us the average change in exam score associated with a one unit increase in prep exams taken, assuming number of hours studied is held constant. In this case, each additional prep exam taken is associated with a decrease of**.602**points in exam score, assuming the number of hours studied is held constant.**Sig. (hours):Â**This is the p-value for the explanatory variableÂ**hours**. Since this value (.000) is less than .05, we can conclude that hours studied has a statistically significant association with exam score.**Sig. (prep_exams):Â**This is the p-value for the explanatory variableÂ**prep_exams**. Since this value (.519) is not less than .05, we cannot conclude that number of prep exams taken has a statistically significant association with exam score.

Lastly, we can form a regression equation using the values shown in the table forÂ **constant**,Â **hours**, andÂ **prep_exams**. In this case, the equation would be:

Estimated exam score =Â 67.674 + 5.556*(hours) â€“ .602*(prep_exams)

We can use this equation to find the estimated exam score for a student, based on the number of hours they studied and the number of prep exams they took. For example, a student that studies for 3 hours and takes 2 prep exams is expected to receive an exam score of 83.1:

Estimated exam score =Â 67.674 + 5.556*(3) â€“ .602*(2) = 83.1

**Note:Â **Since the explanatory variableÂ **prep examsÂ **was not found to be statistically significant, we may decide to remove it from the model and instead perform simple linear regression usingÂ **hours studiedÂ **as the only explanatory variable.