Authors: Li Li, Yanfei Kang, Feng Li
Achieving a robust and accurate forecast is a central focus in finance and econometrics. Forecast combination has been adopted as an essential enhancement tool for improving time series forecasting performance during recent decades (Bergmeir et al., 2016, Garratt et al., 2019, Kolassa, 2011), due to its ability to reduce the risk of selecting a single or inappropriate model and lead to more accurate and robust forecasts (Jose & Winkler, 2008). See Wang, Hyndman, Li, and Kang (2022) for a comprehensive survey in this area. It is evident that merely tackling the model uncertainty can deliver most of the performance benefits despite that the overall forecasting uncertainty is affected by the model, data and parameter uncertainty (Petropoulos, Hyndman, & Bergmeir, 2018). Forecast combination is such an instrument for reducing the overall forecasting uncertainty with the focus on finding optimal weights of different forecasting models in the combination (Kang et al., 2022, Kang, Hyndman et al., 2020, Monteromanso et al., 2020).
Nevertheless, most existing forecasting combination approaches have been limited to point forecasts, which prevent their extension to decision making problems, especially in domains like economics and business planning. In finance and economics, investigators rely on the complete insights of the uncertainty proved by the density forecasts (Kascha & Ravazzolo, 2010). See Tay and Wallis (2000) and Timmermann (2006) for a review on density forecasts in finance and economics.
Primary research already shows that simply applying a similar strategy of point forecast combination to density combination could generally elevate the forecasting performance compared with choosing a particular model (Kascha and Ravazzolo, 2010, Liu and Maheu, 2009). Thereafter, density forecast combinations have attracted broad attention in recent years (Ciccarelli and Hubrich, 2010, Opschoor et al., 2017), mainly focusing on how to weight different forecast densities and update weights over time (Aastveit, Mitchell, Ravazzolo, & Van Dijk, 2018).
Wallis (2005) started this line of work and proposed a finite mixture distribution to combine density forecasts. Then Hall and Mitchell (2007) devised a weighted linear combination by minimizing the distance between forecasting and the true-but-unknown densities based on the logarithmic scoring rule. Pauwels and Vasnev (2016) structured a series of simulation experiments to examine the properties of the optimization problem in Hall and Mitchell (2007). Kascha and Ravazzolo (2010) considered different combining and weighting schemes for inflation density forecasts evaluated by the average log score. Jore, Mitchell, and Vahey (2010) developed a combination strategy for autoregressive models based on log-score recursive weights. Aastveit, Gerdrup, Jore, and Thorsrud (2014) proved that the combined density scheme of Jore et al. (2010) performed better and more robustly than component models in both log scores and calibration tests. A noticeable approach in the literature is the “optimal prediction pools” (shorthand for “OP” hereafter) proposed by Geweke and Amisano (2011), in which they used an instructive linear pool to obtain the optimally weighted density combination under scoring criteria. They utilized the historical performance of forecasting models in the pool to determine the weights by maximizing log predictive scores.
Although these combined forecasting methods improve the accuracy compared to the single model, there are still some apparent disadvantages. The properties of forecast performance from different models may change over time, and the result is that the combining method with constant weights may not be the optimal scheme. We call this “forecast combination uncertainty”. To cope with the challenge in density forecasting, new combination methods with time-varying weights are also studied. For example, Waggoner and Zha (2012) explored the regime-dependent weights and time-varying importance of two macroeconomic models. Casarin, Grassi, Ravazzolo, and Van Dijk (2013) learned time-varying combination weights from past forecasting performance and other mechanisms. Kapetanios, Mitchell, Price, and Fawcett (2015) put forward the “generalized pools” (shorthand for “GP” hereafter), extending the OP method (Geweke & Amisano, 2011) by a more general scheme for combination weights. Kapetanios et al. (2015) utilized piecewise linear weight functions to make the weights depend on the regions of the distribution and proved that GP produced more accurate forecasts compared with optimal combinations with fixed weights (Geweke and Amisano, 2011, Hall and Mitchell, 2007). “Dynamic pools” (Del Negro, Hasegawa, & Schorfheide, 2016) were then provided, relying on a sequence of time-varying weights for the combination of two Dynamic Stochastic General Equilibrium (DSGE) models. Recently, McAlinn and West (2019) provided a Bayesian Predictive Synthesis (BPS) framework, encompassing several existing forecast density combination methods in Geweke and Amisano (2011) and Kapetanios et al. (2015).
The aforementioned variants of density forecast combination mainly focus on forecasting model uncertainty, neglecting the uncertainty or characteristic of the time series itself. In addition, those methods often lack interpretability. To be specific, they directly obtain the optimal combination weights without explaining what features of the time series affect the weights. Feature-based time series forecasting has received remarkable applications over the years. Wang, Smithmiles, and Hyndman (2009) derived recommendation rules by learning the relationship between time series features and the suitability of forecasting methods. Petropoulos, Makridakis, Assimakopoulos, and Nikolopoulos (2014) studied the influence of seven time series features to forecast accuracy and provided helpful conclusions for method selection. Talagala, Li, and Kang (2022) developed a random forest classifier to select the best forecasting model based on 42 time series features under the meta-learning framework. Then Monteromanso et al. (2020) further utilized the 42 features to select weights of each forecasting model and proposed a framework called FFORMA (Feature-based FORecast Model Averaging).
However, the recently developed, especially the machine learning based forecast combination methods are usually black boxes. The internal logic of the combination weights is hard to explain due to the complexity of algorithms. For instance, FFORMA (Monteromanso et al., 2020), utilized 42 expert selected features to determine combination weights with an XGBoost algorithm, ranking second place in the M4 competition. But the relation between the features and weights can not be interpreted due to the “black-box” learning algorithm. In this paper, we study this problem from an orthogonal perspective to the existing literature, that is, to explain time-varying weights by time-varying features of time series. Furthermore, our method handles the forecast uncertainties from different aspects. First, the combination of different models reduces model uncertainty. Second, the combination with time-varying weights can deal with forecast combination uncertainty. Third, time series features are applied to capture data uncertainty. We define the time-varying weights by a softmax transform of a linear function of time series features and redefine the log predictive score function (Geweke & Amisano, 2011). The main extension of our method in the scoring function is that the weights are determined by features and vary over time. Then the optimal weights are obtained through maximizing the historical log predictive scores in the pool, as in Hall and Mitchell (2007), Geweke and Amisano (2011), Kapetanios et al. (2015) and Del Negro et al. (2016). We estimate unknown parameters in the weights by the maximum-a-posteriori (MAP) method, considering the prior knowledge.
Based on the time-varying weighting scheme, a challenge is to choose relevant features to match the forecast combination and interpret the importance of different features. Choosing features only based on some intuition or expertise may lead to feature selection bias, especially when forecasters’ information is inadequate. Nonetheless, vast time series features are proposed in the literature and software recently. It is impossible for practitioners to always pick the right set of features. Putting all possible features into the combination model not only scales up the computational difficulty but also reduces the variable selection efficiency. Because it is well-known that straightforward Bayesian variable selection does not perform well in a very large variable set.
Inspired by the statistical screening methods for variable selection, we introduce an initial screening process to determine some candidate features from a larger feature set and use the ReliefF algorithm (Kononenko, 1994) to pick out a subset of features that shows differences in forecasting performance for different models. Then we introduce an automatic Bayesian variable selection method to weight the contribution of selected features.
There are five principal advantages of the proposed framework: (1) our approach is more comprehensible than black-box forecasting combinations as not only interpreting which features determine the combination weights but also identifying the importance of different features; (2) the combination weights can vary over time based on time-varying features and handle diversiform uncertainties from the model, forecast combination and data; (3) a complete Bayesian framework is formed and prior information in the combination are taken into consideration; (4) the framework is computationally efficient because we can calculate some steps in the offline phase and our algorithm is easy to parallel with large time series sets; and (5) our framework can produce both point forecasts and density forecasts in one step or multiple steps, which makes it more flexible than OP (Geweke & Amisano, 2011) and GP (Kapetanios et al., 2015) methods.
Density forecast combinations with fixed and time-varying weights have achieved growing attention recently. Our proposed framework enriches this vein by obtaining the time-varying weights based on aggregating time-varying features. To the best of our knowledge, this is the first time features are taken into consideration for time-varying forecast combinations. In contrast to black-box forecasting combination schemes, our proposed framework has the significance for interpretability: (1) the time-varying weights at each time point can be expressed by time series features calculated from historical data; (2) the variation of time-varying weights can be explained by the trend of related time-varying features; and (3) the contribution of different features to the forecast combination can be measured through an automatic Bayesian variable selection method with the proposed framework. We summarize the characteristics of our approaches in Table 9 based on a comparison with the SA, OP, GP, and FFORMA methods.
Our framework shows great superiority in both S&P 500 and M3 competition data, indicating that our method is applicable to data of various lengths and from diversified fields. Furthermore, the proposed framework trains optimal parameters based on simple statistical features and the historic forecasting performance, with no need for mass data to learn the relationship between weights and features in contrast to some machine learning methods (Li et al., 2020, Monteromanso et al., 2020, Wang et al., 2021).
Computational efficiency also needs attention, especially when the data set is large or the historical data are long. Because the forecasting of each time series is independent, the proposed framework is parallelizable. Particularly, the training period of our framework is often the most time-consuming part, on account of the calculation of predicted densities and features. We can complete the process in the offline phase. In addition, we further shorten the computing time by using rolling samples in forecasting returns data.
Although we formulated a full Bayesian scheme for time-varying forecasting combination, the inference of the coefficients of time series features is based on a simple MAP scheme and the variable selection process is also simple. Variational inference and stochastic gradient based methods (Chen et al., 2014, Welling and Teh, 2011) could be further explored. The proposed framework is designed for density combinations, but potential users may have forecasts only available in the form of a sample from the predictive distribution. In these circumstances, the variance of the forecast error needs to be estimated through techniques such as bootstrap or ensemble learning to form an empirical distribution. Our framework could be extended to more application scenarios.
Furthermore, the objective of our framework is to obtain the optimal weights in forecast combinations. Current work has neglected to select an appropriate pool of forecasting models, known as trimmed linear pooling (Grushka-Cockayne, Jose, & Lichtendahl Jr, 2017) or forecast pooling (Kourentzes, Barrow, & Petropoulos, 2019). In the two experiments of Sections 4 Forecasting stock market data, 5 Forecasting the monthly M3 data, we traverse all possible composites of forecasting models and find some interesting results. For instance, combining all models is not the best strategy. The improvement of some combinations is significant compared to individual models, but the advantages of other combinations are not obvious. Therefore, constructing an appropriate forecast pool before the proposed framework is essential, especially with abundant alternative models.