Smoothing
- By Datamago team
- Published April 2, 2022
- 4 min read
Smoothing is a simple yet powerful technique to reduce noise in raw data. Noise distracts us from the underlying pattern and may cause unrealistic artifacts in a forecast. Fluctuations in our data due to actual events can also be noise if those events occur randomly, making them impractical to predict.
How smoothing works
Smoothing refers to replacing each value in a dataset with the average of a certain number of values around it. The surrounding values are commonly referred to as a window. As a rule of thumb, larger windows create smoother data.
When to use it
Smoothing is most useful when the raw data contains medium to high levels of evenly distributed noise. Let’s take a look at two examples:
Medium noise
U.S. wind energy production (source) is a good example of data that exhibits medium noise. The trend and seasonality are clearly defined with small fluctuations. A smoothing window of about 3 months should be sufficient.
High noise
Monthly avocado sales [1] in the US from 2015 to the beginning of 2018 exhibits a high level of noise. Yearly seasonality where sales increase in the spring and drop in the fall is somewhat visible, but the jagged and irregular month-to-month movement is a defining feature. A window of 6 or more months will be required.
Important note: Smoothing isn’t the best option when there are anomalies or changing patterns that create unevenly distributed noise. See the end of this article for more information and learn how to fix anomalies here.
How to apply smoothing
- Sign in to your Datamago account, then open the forecast configuration sidebar either by clicking the ‘New’ button on the home page and selecting a file, or selecting the ‘configuration’ menu option of a pre-existing forecast.
- Scroll down and click on ‘advanced options’.
- Find the data smoothing option and select a window size in the dropdown.
What difference does it make?
Below, we'll compare the medium and high noise datasets with and without smoothing. You'll likely need to experiment with different smoothing windows to achieve the best results.
Medium noise without smoothing
Notice how the forecast (yellow line) basically follows the trend. This isn’t entirely unreasonable if your assumption is that the underlying historical context will shift and you want to avoid forecasting the seasonal pattern. But for the purposes of this example we’ll assume that the underlying historical context will continue and forecasting the trend isn’t what we want.
Medium noise with smoothing
This is the result after smoothing the data with a 3 month window. You can see that much of the noise is gone, which results in a cleaner looking forecast.
High noise without smoothing
In this case, the forecast (yellow line) looks similar to the year before it. However, the underlying historical context is irregular (i.e. sales don’t go up or down in the same months from year to year), so it may not be reasonable to assume—unless experience and domain expertise say otherwise—that the most recent year’s pattern will continue in the future.
High noise with smoothing
This is the result after smoothing the data with a 6 month window. The historical data's seasonal pattern is much more stable and easier to to predict, which is reflected in the forecast.
A note about uneven noise
Smoothing is a viable option when the noise is more or less evenly distributed. To illustrate, here’s an alternate version of the medium noise example with uneven noise due to the anomalies circled in red.
This is the result after smoothing the data with a 3 month window. The seasonality is uneven and the dip has turned into a sharp point. Larger windows don’t improve the situation much either, so it’s better to correct the unevenness before smoothing. Learn more about fixing anomalies here.
1. Source: https://www.kaggle.com/datasets/neuromusic/avocado-prices. Data originally from the Hass Avocado Board. The total US avocado volume shown in this post was aggregated from weekly to monthly.