*36*

**Logistic Regression**Â is a method that we use to fit a regression model when theÂ responseÂ variable is binary. Here are some examples of when we may use logistic regression:

- We want to know how exercise, diet, and weight impact the probability of having a heart attack. The response variable isÂ
*heart attack*Â and it has two potential outcomes: a heart attack occurs or does not occur. - We want to know how GPA, ACT score, and number of AP classes taken impact the probability of getting accepted into a particular university. The response variable isÂ
*acceptanceÂ*and it has two potential outcomes: accepted or not accepted. - We want to know whether word count and email title impact the probability that an email is spam. The response variable isÂ
*spamÂ*and it has two potential outcomes: spam or not spam.

This tutorial explains how to perform logistic regression in Stata.

**Example: Logistic Regression in Stata**

Suppose we are interested in understanding whether a motherâ€™s age and her smoking habits affect the probability of having a baby with a low birthweight.

To explore this, we can perform logistic regression using age and smoking (either yes or no) as explanatory variables and low birthweight (either yes or no) as a response variable. Since the response variable is binary â€“ there are only two possible outcomes â€“ it is appropriate to use logistic regression.

Perform the following steps in Stata to conduct a logistic regression using the dataset calledÂ *lbw*, which contains data on 189 different mothers.

**Step 1: Load the data.**

Load the data by typing the following into the Command box:

use http://www.stata-press.com/data/r13/lbw

**Step 2: Get a summary of the data.**

Gain a quick understanding of the data youâ€™re working with byÂ typing the following into the Command box:

summarize

We can see that there are 11 different variables in the dataset, but the only three that we care about are the following:

**low**â€“ whether or not the baby had a low birthweight. 1 = yes, 0 = no.**age**â€“ age of the mother.**smoke**â€“ whether or not the mother smoked during pregnancy. 1 = yes, 0 = no.

**Step 3: Perform logistic regression.**

Type the following into the Command box to perform logistic regression usingÂ *ageÂ *andÂ *smokeÂ *as explanatory variables andÂ *lowÂ *as the response variable.

logit low age smoke

Here is how to interpret the most interesting numbers in the output:

**Coef (age):**Â -.0497792. Holding *smoke* constant, each one year increase in age is associated with a exp(-.0497792) = .951 increase in the odds of a baby having low birthweight. Because this number is less than 1, it means that an increase in age is actually associated with a decrease in the odds of having a baby with low birthweight.

For example, suppose mother A and mother B are both smokers. If mother A is one year older than mother B, then the odds that mother A has a low birthweight baby are just 95.1% of the odds that mother B has a low birthweight baby.

**P>|z| (age):Â **0.119. This is the p-value associated with the test statistic forÂ *age*. Since this value is not less than 0.05, age is not a statistically significant predictor of low birthweight.

**Odds Ratio (smoke):**Â .6918486. Holding *age* constant, a mother who smokes during pregnancy has exp(.6918486) = 1.997 higher odds of having a baby with low birthweight compared to a mother who does not smoke during pregnancy.

For example, suppose mother A and mother B are both 30 years old. If mother A smokes during pregnancy and mother B does not, then the odds that mother A has a low birthweight baby are 99.7% higher than the odds that mother B has a low birthweight baby.

**P>|z| (smoke):Â **0.032. This is the p-value associated with the test statistic forÂ *smoke*. Since this value is less than 0.05,Â *smoke* is a statistically significant predictor of low birthweight.

**Step 4: Report the results.**

Lastly, we want to report the results of our logistic regression. Here is an example of how to do so:

A logistic regression was performed to determine whether a motherâ€™s age and her smoking habits affect the probability of having a baby with a low birthweight. A sample of 189 mothers was used in the analysis.

Â

Results showed that there was a statistically significant relationship between smoking and probability of low birthweightÂ (z = 2.15, p = .032) while there was not a statistically significant relationship between age and probability of low birthweight (z = -1.56, p = .119).