A Guide to Forecasting After COVID
- By Datamago team
- Published April 29, 2022
- 7 min read
Many datasets have been impacted in one way or another by COVID. It’s not uncommon to see graphs—starting at the onset of the pandemic in the early months of 2020—with a large section that’s higher, lower, or more volatile than the established historical pattern.
Regardless of whether your data has already returned to normal or is in the middle of a recovery, predicting the future is very difficult when the underlying historical context has suddenly shifted—especially in the recent past. The way to address this historical shift is by adjusting the data in some way: either by removing irrelevant historical context or replacing pandemic era values with predictions that are based on the more stable, pre-pandemic data.
Option I: Remove irrelevant data
Datamago’s trimming feature allows you to dynamically remove rows from the beginning (and end) of the dataset. This is a viable option if one of the following assumptions is true:
- Your data is in the process of returning to normal and you expect the trend to continue during the forecast period. The airline industry is a good example: people will likely resume normal travel in the future, but the recovery is slow.
- There’s been a fundamental shift in the underlying conditions and the data will be intrinsically different moving forward. The workforce is a good example as a higher percentage of people will work from home from now on.
If either is true, the pre-pandemic data is irrelevant for forecasting the future and leaving it in the training data creates unnecessary noise. You may also consider removing all rows up until the post-pandemic recovery if the recovery is substantially different from the pandemic data.
Regardless of where the data is trimmed to, the key is to remove the noise and start fresh. Depending on how far you need to trim, you may also need to make the validation set smaller since the training portion of the data needs to be at least 3X bigger than the validation set. This is demonstrated in the trimming example at the end of this post.
Option II: Historical Forecasting
This is a viable option if your data has already returned to normal or you’re comfortable with the assumption that normality will return soon. And by ‘normal’, we’re referring to a return to the pre-pandemic historical pattern.
Historical forecasting replaces actual values with predictions. The predicted values are based on the stable parts of the data (which should be the majority) so it has a repairing effect. Ultimately, the impact of COVID would be erased from your historical data.
Keep in mind that the more stable and predictable your pre-pandemic data is, the better the historical forecast will be.
Note: You could use variables to explain the section of your data affected by COVID, but finding the right ones can be time consuming and there's no guarantee of a better result. As such, one of the previously mentioned options is recommended if it works for you.
Example - International Arrivals at SFO
We’ll use passenger arrivals to the San Francisco International Airport (SFO) to illustrate how trimming and historical forecasting can improve forecasts from data impacted by COVID. The airline industry was hit particularly hard so it’ll serve as an interesting yet challenging use case.
Below is a graph of the passenger arrivals from 2005 to June 2021 (the most recent data available at the time of writing). As you can see, the data changes drastically once the pandemic starts in the spring of 2020. Attempting to forecast this data without any adjustments would yield undesirable results.
Historical forecasting
When replacing the pandemic era data with predictions, the assumption is that the data either has returned to normal or is expected to do so in the near future. This outcome isn’t very likely in the case of SFO passengers but we’ll proceed anyway to demonstrate how the feature works. As you can see in the video below, only one historical forecast is required since the data leading up to the pandemic is stable and predictable.
Below is the resulting forecast with the dotted blue line indicating the predictions that replaced the original values:
Trimming
For this example, the assumption is that the passenger count will continue to steadily trend up towards its pre-pandemic historical pattern. Several adjustments were made to make the trend easier to learn and forecast: first, we made the validation set smaller since, regardless of trimming, the dataset needs to be at least 3X longer than the validation set; second, we reduced the variance by adjusting some of the historical values since we’re more interested in forecasting potentially spurious fluctuation; and finally, smoothing was applied (learn more here).
The resulting forecast propagates the upward trend (with 25% year-over-year growth) towards SFO’s pre-pandemic passenger count: