From Econometrics to Machine Learning: The Challenge of Forecasting

The Case of the Hourly National Single Price (PUN)

.

Electric Power Price Drivers

In recent years, artificial intelligence and Machine Learning have increasingly spread into the field of data analysis, positioning themselves as alternative tools to traditional statistical and econometric models for predicting and estimating relationships between variables.
This raises a question: do traditional structural and econometric models still guarantee better performance, or have Machine Learning systems become more effective?

In data analysis, three different methodological approaches can be distinguished:

  1. Traditional Econometric Approach: the analysis is based on a theoretical structure explicitly defined by the user (mathematical equation, assumptions, etc.) and is then carried out according to the specified model.
  2. Machine Learning Approach: the analysis is performed autonomously by the program, without the user imposing any econometric structure. The model learns directly from the data, optimizing its predictive performance.
  3. Hybrid Approach: the user defines some components of the model but allows the program to complete the analysis using Machine Learning techniques.

The objective of this study is to compare the predictive performance of these different systems on the Hourly National Single Price (PUN) in the Italian electricity market, examining four representative models and using the Root Mean Squared Error (RMSE) as the evaluation metric. A structural model developed specifically for the PUN serves as the benchmark, thanks to the level of detail and depth with which it was built.

Do you want to stay up-to-date on commodity market trends?
Sign up for PricePedia newsletter: it's free!

Forecasting Cycle

To perform the comparison, a rolling window forecasting cycle is used. Specifically, each model produces forecasts up to the following Friday at 23:00, starting from three different initial periods:

  • P7: weekly forecast, with data available up to Thursday of the previous week at 23:00;
  • P5: five-day forecast, with data available up to Sunday of the previous week at 23:00;
  • P3: three-day forecast, with data available up to Tuesday of the previous week at 23:00.

With this approach, forecasts are generated for three different time horizons.
The cycle is repeated over a sample of 16 consecutive weeks: starting from May 30, 2025, the date of the first forecast, and continuing week by week until September 12, 2025.

Then, for each model, RMSE values are calculated for the P7, P5, and P3 forecasts, making it possible to evaluate how model performance changes as the forecast horizon increases. These values, repeated for all weeks considered, generate three statistical distributions that summarize the predictive performance of each system across different time horizons.
From the analysis of these distributions, key indicators can be derived: the mean (RMSE_mean), which identifies the model that is on average the most accurate for each forecast horizon; and the maximum (RMSE_max), which provides a measure of the risk associated with the largest errors. Analyzing the error distributions allows the identification of systems that, while generally accurate, may occasionally produce extreme errors, and conversely, more stable models even if slightly less precise on average.

Structural Econometric Model

The developed model aims to explain and forecast the hourly National Single Price (PUN) of the Italian electricity market, taking into account both the main known drivers and systematic components related to the hour of the day.

The dependent variable is the hourly PUN, expressed in €/MWh. Among the main independent variables are the daily natural gas price at the PSV trading point, the price of CO₂ emission permits, the total hourly electricity demand in Italy, and the hourly share of renewable energy production (RES), represented by two distinct variables: SFER_LOW for shares below 70%, and SFER_HIGH for shares equal to or above 70%.
To isolate systematic variations associated with the hour of the day, the model also includes 23 dummy variables, one for each hour except for the reference hour. These allow estimating the average specific effect of each hour on the PUN.[1]

After estimating the long-term and short-term components, the structural model is used to generate Out-Of-Sample forecasts over three different time horizons. The resulting RMSE values are as follows:

Horizon RMSE_mean RMSE_max RMSE%_mean RMSE%_max
P7 14.09 26.35 13.09 27.51
P5 13.87 27.78 12.82 29.47
P3 11.03 20.64 9.59 16.70

These results indicate that, on average, the forecasting error of this model ranges between 11 and 14 €/MWh, depending on the forecast horizon. The maximum error is between 20 and 28 €/MWh. Considering the variability of the hourly PUN observed over the summer, these forecasting results are particularly good.

As expected, the average RMSE decreases as the forecast horizon shortens. The maximum RMSE follows a similar trend, with some fluctuations: the highest value is observed for P5.
The structural model demonstrates strong predictive power, especially over short horizons, with low errors both in absolute and percentage terms. The increase in error over longer horizons is consistent with the nature of the problem and suggests that medium- to long-term PUN fluctuations are harder to capture, whereas short-term forecasts are more reliable.

The quality of the results depends not only on how accurately the short- and long-term equations reproduce the relationship between the explanatory variables and the hourly PUN, but also on the correspondence between the values used in the equations and those actually observed. In this case, the explanatory variables were used in their actual values, thus creating optimal conditions for accurate forecasts of the hourly PUN.
It is important to emphasize that the results of this structural econometric model should not be interpreted as a forecast of possible outcomes, but rather as an optimal benchmark useful for comparing the performance of the other models presented below.

Machine Learning Model: Tiny Time Mixers

The Tiny Time Mixers (TTM) model is an open-source system developed by IBM Research for time series forecasting, based on advanced Machine Learning algorithms.
The Out-Of-Sample forecast of the PUN was performed in Zero-Shot mode, meaning no specific training was conducted on Italian electricity market data. Instead, the model leveraged the capabilities learned during the pre-training conducted by the developers, without updating any parameters through additional training. This makes the model’s performance entirely based on Machine Learning, without any input from the user.
A detailed description of this model will be the subject of a future article.

The Out-Of-Sample forecast produced the following results:

Horizon RMSE_mean RMSE_max RMSE%_mean RMSE%_max
P7 19.14 30.71 17.68 33.50
P5 16.55 25.50 14.49 25.05
P3 13.81 23.89 12.21 24.77

The TTM model shows less precise performance compared to the structural model but remains accurate. The average RMSE is higher across all time horizons, yet values remain acceptable, particularly as the error decreases noticeably when the forecast horizon shortens. The same trend applies to the observed maximum RMSE: shorter horizons see reduced peaks, indicating that the model adapts better when the uncertainty to capture is more limited.

The TTM proves to be a flexible and competitive approach, especially considering it was used without any specific training on the PUN. This suggests that with targeted fine-tuning or dedicated training, the model could further improve its performance and approach that of the structural model, making it a particularly valid alternative.

Econometric Model: SARIMA

The SARIMA (Seasonal Autoregressive Integrated Moving Average) model is an extension of the traditional ARIMA model, specifically designed for time series with seasonal patterns. Its structure combines autoregression (AR), integration (I), moving average (MA), and seasonality (s), allowing it to capture both short-term and long-term dependencies.
Formally, it is expressed as: SARIMA(p, d, q)(P, D, Q, s).
The user must manually specify the model order, distinguishing between the seasonal and non-seasonal parts. In our case, the Out-Of-Sample forecasts were generated using a SARIMA(1, 0, 1)(1, 1, 1, 24) model.

The results in terms of RMSE are reported below:

Horizon RMSE_mean RMSE_max RMSE%_mean RMSE%_max
P7 19.95 32.15 18.48 35.07
P5 17.15 27.44 15.03 25.23
P3 15.22 32.52 13.39 28.42

The results show a predictive capacity that, also in this case, improves as the forecasting horizon shortens, similar to the other models, but overall remains less accurate compared to the structural model and the TTM.
The mean RMSE is slightly higher, with a progressive reduction as the horizon moves from weekly to three-day forecasts.
A particularly relevant aspect concerns the maximum RMSE, which reaches significantly higher values than those of the other two models. This indicates that, despite decent average stability, the model can produce substantial errors, especially during periods of strong PUN volatility.

Hybrid Model: Prophet

The Prophet model, developed by Meta[2], represents an example of a hybrid approach to data analysis. The model assumes an additive decomposition of the time series, where the observation at time t can be expressed as follows:
y(t) = trend(t) + seasonality(t) + holidays(t) + error(t).

At the same time, Prophet automatically identifies change points using a Bayesian approach and assigns more weight to points that improve the forecast, without requiring the user to specify them manually. This type of learning makes the model adaptive to data patterns without manual intervention for each detail.

The Out-Of-Sample forecasts produced the following results:

Horizon RMSE_mean RMSE_max RMSE%_mean RMSE%_max
P7 22.49 56.74 20.51 49.51
P5 18.41 38.69 16.10 35.14
P3 17.97 30.54 15.75 28.30

The hybrid Prophet model shows significantly weaker results compared to the other tested models. Average RMSE values are relatively high, and the maximum RMSE highlights the model’s fragility in high-variability contexts, with a very large peak for P7.
Prophet is certainly a simple exploratory tool to implement but is not competitive for accurate forecasting in such a volatile market.

Conclusion

The comparison confirms the robustness of the structural model as a reference tool for PUN forecasting, but other models represent interesting alternatives, especially when faster setup and broader adaptability are desired.

The TTM model, while not achieving the same overall accuracy as the structural model, demonstrates good predictive capability, particularly for short-term forecasts. Moreover, since it was used in zero-shot mode without any specific training on the PUN, targeted fine-tuning could further improve its performance and potentially approach or even surpass that of the structural model in certain scenarios. However, in that case, it would no longer be a purely Machine Learning-based approach.

SARIMA and Prophet models are less precise in this context but remain easy to interpret, unlike the TTM.


[1] A detailed description of the structural model can be found in the article: Hourly PUN price: Is there a specific “hour” effect?
[2] Prophet – Forecasting at scale