Three foundational forecasting models compared

Which forecasting model performs best?

.

Electricity's National Single Price Electric Power Machine learning and Econometrics

Foundation models represent a possible evolution in the field of time series forecasting, often outperforming traditional models. But which of them is the most accurate in predicting the hourly PUN? This study compares three foundational approaches for univariate forecasting: TimesFM, TimeGPT, and Tiny Time Mixers. The models differ in architecture, temporal context handling, and normalization strategies, providing a broad overview of advanced solutions available for PUN forecasting.

Foundation models for forecasting

Foundation models for time series forecasting are Machine Learning models that are usually large-scale and pre-trained on vast sets of heterogeneous time series. Thanks to this large-scale training, they can generate accurate predictions without requiring fine-tuning on the specific dataset. The aim of this study is to verify whether these models are indeed capable of delivering good performance in predicting the PUN in a zero-shot setting. [1]

To this end, we perform a rolling-window forecast for three prediction horizons: 3 days (P3), 5 days (P5), and 7 days (P7), repeated over 16 consecutive weeks. To compare accuracy, for each horizon we compute both the average and maximum Root Mean Squared Error (RMSE). [2]

Do you want to stay up-to-date on commodity market trends?
Sign up for PricePedia newsletter: it's free!

TimeGPT Forecasts

We begin the comparison with TimeGPT, the first true foundational model dedicated to time series forecasting, developed by Nixtla, a startup founded in 2021 specializing in software for time series analysis and forecasting. It is a model that is simple to configure, but unlike the following ones we will examine, it is not fully available in open-source form.
Below are the RMSE values resulting from the cyclical forecast:

Period Average RMSE Maximum RMSE Average RMSE% Maximum RMSE%
P7 15.813 27.246 14.55% 24.58%
P5 12.789 23.054 11.24% 21.20%
P3 11.624 25.449 10.26% 22.24%

As expected, the results obtained with TimeGPT show a natural increase in average error as the prediction horizon grows, and we expect the following models to show a similar pattern. Nevertheless, performance is already very good: the model provides more accurate forecasts than various traditional approaches, despite requiring no additional training on hourly PUN data.

TimesFM Forecasts

TimesFM is an open-source foundational model for time series forecasting, developed by Google Research. Although it is a large-scale model, it remains relatively compact (around 200 million parameters) compared to large language models such as GPT. It was trained on billions of heterogeneous data points, enabling accurate zero-shot forecasting across many types of time series. The RMSE values obtained from the forecasts are shown below:

Period Average RMSE Maximum RMSE Average RMSE% Maximum RMSE%
P7 16.542 28.829 15.38% 31.15%
P5 12.642 22.359 11.07% 18.32%
P3 11.871 24.241 10.43% 21.18%

From the table, we observe that TimesFM maintains solid predictions across all three forecast horizons. As expected, the average error increases with the prediction horizon: P7 shows the highest values, followed by P5 and P3. In percentage terms, the relative error remains contained, confirming the model’s ability to adapt to different contexts even in zero-shot mode. These results show that TimesFM offers competitive performance, slightly lower than larger models like TimeGPT, but with the advantage of being open-source.

Tiny Time Mixers (TTM) Forecasts

Tiny Time Mixers (TTM), developed by IBM Research, represent an alternative to large foundational models, adopting a lighter and more compact architecture with roughly 1 million parameters to ensure greater computational efficiency. [3]
The results obtained from the forecasts are shown below:

Period Average RMSE Maximum RMSE Average RMSE% Maximum RMSE%
P7 16.748 25.442 15.46% 27.27%
P5 14.095 25.326 12.28% 20.11%
P3 11.466 25.383 10.11% 22.18%

Tiny Time Mixers deliver solid predictions across all three horizons. Compared to TimeGPT and TimesFM, TTM show similar performance in the short term (P3), even with a slightly lower average RMSE. For longer horizons (P5 and P7), TTM show a more pronounced increase in error, making it less accurate. This pattern reflects the typical trade-off between model size and predictive capability: TTM, being more compact and efficient, maintain good performance but decrease slightly in accuracy under zero-shot conditions.

Conclusions

All three models provide solid forecasts, without any one model showing a clear, decisive advantage over the others. However, the results indicate that prediction accuracy slightly decreases as model size decreases: TimeGPT tends to show the lowest average errors, followed by TimesFM and finally TTM. This is expected: models with more parameters are generally better at adapting to diverse contexts in zero-shot settings, resulting in superior performance.

It is important to emphasize that the strength of foundational models does not lie solely in their zero-shot forecasting ability, but also in the possibility of retraining them on the target dataset to further improve performance in a specific scenario.
To complete the comparison, it will therefore be necessary to experiment with fine-tuning all three models analysed: only then will it be possible to fully assess their predictive potential on hourly PUN data. This represents a natural next step for future work.


[1] For an analysis of the structure of the three models and zero-shot forecasts, see the article: The arrival of foundational models in time series forecasting.
[2] For details on the predictive cycle structure, see the paragraph “Forecasting cycle” in the article: From Econometrics to Machine Learning: The Challenge of Forecasting.
[3] The structure of TTM is described here: Machine Learning in time series forecasting: introducing TinyTime Mixers.