AÂ box-cox transformation is a commonly used method for transforming a non-normally distributed dataset into a more normally distributed one.
The basic idea is to find some value for λ such that the transformed data is as close to normally distributed as possible, using the following formula:
- y(λ) = (yλ – 1) / λ if y ≠0
- y(λ) = log(y) if y = 0
The following step-by-step example shows how to perform a box-cox transformation on a dataset in Excel.
Step 1: Enter the Data
First, let’s enter the values for a dataset:
Step 2: Sort the Data
Next, create an index column and a column of sorted data:
Step 3: Choose an Arbitrary Value for Lambda
Next, we’ll choose an arbitrary value of 1 for lambda and apply a temporary box-cox transformation to the data:
Step 4: Calculate the Z-Scores
Next, we’ll calculate the z-score for each value in the index:
We’ll then calculate the correlation between the box-cox transformed values and the z-scores:
Step 5: Find the Optimal Lambda Value
Next, we’ll use Goal Seek to find the optimal lambda value to use in the box-cox transformation.
To do so, click the Data tab along the top ribbon. Then click What-If-Analysis within the Forecast group.
In the dropdown menu, click Goal Seek and fill in the following values:
Once you click OK, Goal Seek will automatically find the optimal lambda value to be -0.5225.
Step 6: Perform the Box-Cox Transformation
Lastly, we’ll apply the box-cox transformation to the original data, using a lambda value of -0.5225:
Bonus: We can confirm that the transformed data is normally distributed by performing a Jarque-Bera test in Excel.
Additional Resources
How to Transform Data in Excel (Log, Square Root, Cube Root)
How to Calculate Z-Scores in Excel