All posts

Does the Past Predict the Future?

  • By Datamago team
  • Published March 16, 2022
  • 3 min read

It’s easy to get bogged down by details when forecasting. We’re constantly faced with a variety of internal and external factors that often impact our data in different ways over time. As such, the first and most important step in the forecasting process is to look at the raw historical data as a whole and ask ourselves “is what I’m seeing a reasonable representation of the future?” To help answer this question, we can classify data into one of 3 generic levels of predictability:

Raw data predictability Trend and/or seasonal consistency
High Perfectly consistent over time
Medium Somewhat to mostly consistent over time with spikes and dips that may appear random to those not familiar with the data
Low Little to no identifiable trend or seasonality and/or a pattern that constantly changes over time

High predictability

Minimal to no intervention is required to forecast highly predictable raw data, as long as you’re comfortable with assuming future conditions will remain the same. A perfect example is the level of atmospheric CO2 in Mauna Loa, Hawaii [1]. Below, you can see a steady upward trend with consistent yearly seasonality, no anomalous values, and little to no fluctuation. If your data looks similar to this you can forecast it very quickly and easily with Datamago.

Medium predictability

The predictability of most raw datasets lands somewhere in the middle due to varying degrees of anomalous values and/or noise, and some level of inspection and preparation is required to create a reasonable forecast.

Below is an example dataset of a fictional restaurant in a tourist town. You can see consistent trend and yearly seasonality similar to the Mouna Loa dataset above, but there are a few anomalies circled in red.

Note: Datamago highlights outliers automatically, however, the circled anomalies aren’t extreme enough compared to the surrounding values to be flagged. In these cases you'll have to make a judgment call.

Like in real life, the circled anomalies could be due to a number of causes such as festivals, renovations, and local policies, to name a few. And how we deal with them depends on whether or not they’re expected to occur again in the future. Learn more about easily fixing anomalous values here.

Aside from anomalies, small fluctuations throughout the data is also a common occurrence and is fairly easy to manage though a technique called smoothing. To illustrate, here's a different version of the restaurant’s sales with a small amount of noise instead of the anomalies. Learn more about smoothing here.

Low predictability

Data with low predictability is characterized by extensive and inconsistent fluctuations. This may be due to random events, data collection issues, missing values, high amplitude noise, or the data is intrinsically unpredictable. Here’s the restaurant sales dataset again with the noise level turned up so that the yearly seasonality is only slightly discernible. In cases like this it’s usually best to either smooth the data as much as possible or project the mean if the noise overwhelms the seasonality. Another option is to resample the data to a broader time scale (for example, convert daily data to weekly or monthly).

Random walk

A random walk is worth mentioning in its own category as theoretically it cannot be forecasted based on historical values. Data is considered to be a random walk when the movement from value to value is equally likely to go up or down. In short, no reliable pattern exists and attempting to forecast it is very risky if not impossible, even if there's a trend that appears to be extrapolatable.

Random walk theory is often applied to the stock market where the assumption is that past movement or trend is not predictive of the price in the long run. Below is the S&P 500’s daily value at market close throughout 2018. Based on this graph alone—without taking external factors into account—we couldn’t estimate with any reasonable confidence what the future price may look like.

Anyone with a Datamago account can easily improve their data's predictability with the techniques mentioned in this article!

1. Source: Dr. Pieter Tans, NOAA/GML (gml.noaa.gov/ccgg/trends/) and Dr. Ralph Keeling, Scripps Institution of Oceanography (scrippsco2.ucsd.edu)