A moving average is a technique that can be used to smooth out time series data to reduce the “noise” in the data and more easily identify patterns and trends.
The idea behind a moving average is to take the average of a certain number of previous periods to come up with an “moving average” for a given period.
This tutorial explains how to calculate moving averages in Python.
Example: Moving Averages in Python
Suppose we have the following array that shows the total sales for a certain company during 10 periods:
x = [50, 55, 36, 49, 84, 75, 101, 86, 80, 104]
Method 1: Use the cumsum() function.
One way to calculate the moving average is to utilize the cumsum() function:
import numpy as np #define moving average function def moving_avg(x, n): cumsum = np.cumsum(np.insert(x, 0, 0)) return (cumsum[n:] - cumsum[:-n]) / float(n) #calculate moving average using previous 3 time periods n = 3 moving_avg(x, n): array([47, 46.67, 56.33, 69.33, 86.67, 87.33, 89, 90])
Here is how to interpret the output:
- The moving average at the third period is 47. This is calculated as the average of the first three periods: (50+55+36)/3 = 47.
- The moving average at the fourth period is 46.67. This is calculated as the average of the previous three periods: (55+36+49)/3 = 46.67.
And so on.
Method 2: Use pandas.
Another way to calculate the moving average is to write a function based in pandas:
import pandas as pd #define array to use and number of previous periods to use in calculation x = [50, 55, 36, 49, 84, 75, 101, 86, 80, 104] n=3 #calculate moving average pd.Series(x).rolling(window=n).mean().iloc[n-1:].values array([47, 46.67, 56.33, 69.33, 86.67, 87.33, 89, 90])
This method produces the exact same results as the previous method, but it tends to run faster on larger arrays.
Note that you can also specify any number of previous time periods to use in the calculation of the moving average. For example, perhaps you’d rather use n=5:
#use 5 previous periods to calculate moving average n=5 #calculate moving average pd.Series(x).rolling(window=n).mean().iloc[n-1:].values array([54.8, 59.8, 69. , 79. , 85.2, 89.2])
The more periods you use to calculate the moving average, the more “smoothed” out the moving average line will be.