*77*

A **dichotomous variable** is a type of variable that only takes on two possible values.

Some examples of dichotomous variables include:

- Gender: Male or Female
- Coin Flip: Heads or Tails
- Property Type: Residential or Commercial
- Athlete Status: Professional or Amateur
- Exam Results: Pass or Fail

These types of variables occur all the time in practice. For example, consider the following dataset that contains 10 observations and 4 variables:

The variables **gender** and **Won Championship** are dichotomous because they can each only take on two possible values:

However, the variables **Division **and **Average Points** are not dichotomous because they can take on multiple values.

Bonus Tip:

You can remember that dichotomous variables can only take on two values by remembering that the prefix “di” is a Greek word that means “two”, “twice”, or “double.”

**How to Create Dichotomous Variables**

It’s worth noting that we can create a dichotomous variable from a continuous variable by simply separating values based on some threshold.

For example, in the previous dataset we could turn the variable **Average Points** into a dichotomous variable by classifying players with an average above 15 as “high scorers” and those with an average below 15 as “low scorers”:

**How to Visualize Dichotomous Variables**

We typically visualize dichotomous variables by using a simple bar chart to represent the frequencies of each value it can take on.

For example, the following bar chart shows the frequencies of each gender in the previous dataset:

We could also display the frequencies as percentages on the y-axis:

This allows us to easily see that 70% of the total athletes in the dataset are male and 30% are female.

**How to Analyze Dichotomous Variables**

There are several ways to analyze dichotomous variables. Two of the most common ways include:

**1. One proportion z-test**

A one proportion z-test determines whether or not some observed proportion is equal to a theoretical one.

For example, we might use this test to determine if the true proportion of athletes who are male in some population is equal to 50%.

**2. Point-biserial correlation**

Point-biserial correlation is used to measure the relationship between a dichotomous variable and a continuous variable.

This type of correlation takes on a value between -1 and 1 where:

- -1 indicates a perfectly negative correlation between two variables
- 0 indicates no correlation between two variables
- 1 indicates a perfectly positive correlation between two variables

For example, we might calculate the point-biserial correlation between gender and average points per game to understand how strongly these two variables are related.