Cubic regression is a type of regression we can use to quantify the relationship between a predictor variable and a response variable when the relationship between the variables is non-linear.
This tutorial explains how to perform cubic regression in Python.
Example: Cubic Regression in Python
Suppose we have the following pandas DataFrame that contains two variables (x and y):
import pandas as pd #create DataFrame df = pd.DataFrame({'x': [6, 9, 12, 16, 22, 28, 33, 40, 47, 51, 55, 60], 'y': [14, 28, 50, 64, 67, 57, 55, 57, 68, 74, 88, 110]}) #view DataFrame print(df) x y 0 6 14 1 9 28 2 12 50 3 16 64 4 22 67 5 28 57 6 33 55 7 40 57 8 47 68 9 51 74 10 55 88 11 60 110
If we make a simple scatterplot of this data we can see that the relationship between the two variables is non-linear:
import matplotlib.pyplot as plt
#create scatterplot
plt.scatter(df.x, df.y)
As the value for x increases, y increases up to a certain point, then decreases, then increases once more.
This pattern with two “curves” in the plot is an indication of a cubic relationship between the two variables.
This means a cubic regression model is a good candidate for quantifying the relationship between the two variables.
To perform cubic regression, we can fit a polynomial regression model with a degree of 3 using the numpy.polyfit() function:
import numpy as np #fit cubic regression model model = np.poly1d(np.polyfit(df.x, df.y, 3)) #add fitted cubic regression line to scatterplot polyline = np.linspace(1, 60, 50) plt.scatter(df.x, df.y) plt.plot(polyline, model(polyline)) #add axis labels plt.xlabel('x') plt.ylabel('y') #display plot plt.show()
We can obtain the fitted cubic regression equation by printing the model coefficients:
print(model)
3 2
0.003302 x - 0.3214 x + 9.832 x - 32.01
The fitted cubic regression equation is:
y = 0.003302(x)3 – 0.3214(x)2 + 9.832x – 30.01
We can use this equation to calculate the expected value for y based on the value for x.
For example, if x is equal to 30 then the expected value for y is 64.844:
y = 0.003302(30)3 – 0.3214(30)2 + 9.832(30) – 30.01 = 64.844
We can also write a short function to obtain the R-squared of the model, which is the proportion of the variance in the response variable that can be explained by the predictor variables.
#define function to calculate r-squared def polyfit(x, y, degree): results = {} coeffs = np.polyfit(x, y, degree) p = np.poly1d(coeffs) #calculate r-squared yhat = p(x) ybar = np.sum(y)/len(y) ssreg = np.sum((yhat-ybar)**2) sstot = np.sum((y - ybar)**2) results['r_squared'] = ssreg / sstot return results #find r-squared of polynomial model with degree = 3 polyfit(df.x, df.y, 3) {'r_squared': 0.9632469890057967}
In this example, the R-squared of the model is 0.9632.
This means that 96.32% of the variation in the response variable can be explained by the predictor variable.
Since this value is so high, it tells us that the cubic regression model does a good job of quantifying the relationship between the two variables.
Related: What is a Good R-squared Value?
Additional Resources
The following tutorials explain how to perform other common tasks in Python:
How to Perform Simple Linear Regression in Python
How to Perform Quadratic Regression in Python
How to Perform Polynomial Regression in Python