Selected Projects – KLLAB.org

NSFC Project (2022-2024)

Simultaneous forecasting of large-scale time series: a global modeling perspective

With the advent of Big Data, nowadays, in many applications, rather than predicting individual or small numbers of time series, one is faced with the need to forecast thousands or millions of related time series. In addition to the large quantity, these data are also faced with challenges such as “large correlation”, “complex distributions” and “little historical information”. In the majority of the classical forecasting methods, model parameters for each given time series are estimated independently for each time series from past observations, which does not borrow the information within other related time series and does not work when only limited historical information is available. More complex machine learning methods do not necessarily produce better forecasts than simpler ones due to problems such as over-fitting and non-stationarity. In this project, we propose to simultaneously forecast all the time series in a dataset by building global models, which is able to learn complex patterns across multiple time series and avoid over-fitting and waste of computing resources. This project aims to study forecasting based on global models from the following three perspectives: feature-based forecast combination of large-scale related time series, global modeling of large-scale intermittent demand data and forecasting methods for very short time series. In the end, this project intends to apply the proposed global-model-based forecasting methods to solve the challenges brought by the surge of time series data in the large-scale management practice and support better management decisions.

Alibaba Innovative Research Project (2021-2022)

Forecasting methods for complex time series in e-commerce

Details are available at https://damo.alibaba.com/air/1780.

NSFC Project (No. 11701022, 2018-2020)

Instance spaces for time series forecasting

Our confidence in the future performance of any algorithm, including time series forecasting algorithms, depends on how carefully we select test instances so that the generalization of algorithm performance on future instances can be inferred. It is common practice to evaluate the strength of forecasting methods using collections of well-studied time series datasets. The question is, though, how diverse and challenging are these time series, and do they enable us to study the unique strengths and weaknesses of different forecasting methods? Noting that the diversity of many benchmarking dataset has been questioned, in this project, we will firstly establish a methodology to generate a two-dimensional instance space, comprising known time series instances. This instance space shows the similarities and differences between the instances using measurable features or properties, and enables the performance of forecasting algorithms to be viewed across the instance space, where generalizations can be inferred. The diversity of the time series in an existing collection can be assessed. Secondly, this project proposes a method for generating new time series with controllable characteristics, by filling observed gaps in the instance space. This enables the generation of rich new sets of time series instances to make generalizations of forecasting method performances as robust as possible. In the end, we obtain insights about algorithm strengths and weaknesses by examining the regions in the instance space where strong or weak performances can be expected.

NSFC Project (No. 11501587, 2016-2018)

Efficient Bayesian flexible density methods with high dimensional financial data

Bayesian Flexible modeling of high dimensional density is the state-of-the-art topic in Bayesian methodology. Financial data have the unique feature such as time-dependent, high-dimensional, non-Gaussian and heavy correlated among variables. Tremendous research has been done on continuous financial data in low-dimensions. Recent research also has found the importance of textual data interfering financial events. Unfortunately there is still lack of research on modeling high-dimensional data combining with textual data and continuous data. This is partially because that constructing high-dimensional density that can be modeled with continuous and discrete margins is not yet efficient. Usual statistical inference tools are less likely to be successful in that setting because of the curse of dimensionality, especially when there are more than a couple of margins.

In this project, we propose a general approach for modeling data features in high-dimensional density with flexible continuous and discrete marginal densities. Our approach begins with a two-dimensional copula density where the rank correlation and tail-dependence coefficients are connected with covariates via smooth functions, in which the two marginal densities are from finite mixture of student-t densities and Poisson densities, respectively. We propose a highly efficient MCMC algorithm that updates all the marginal and joint density features jointly. And we also propose an efficient stochastic searching margins permutation algorithm that effectively constructs a high-dimensional flexible copula density with flexible bivariate copulas. Unlike the usual reversible jump MCMC used in the literature which is heavily dependent by the choice of prior and can only update one margin per time. Our algorithm jointly update the joint multivariate density with an efficient proposal of margin combinations by Bayesian model comparison techniques based on out-of-sample predictive performance that eliminates the effect from the prior.

Our proposed Bayesian approach is applied to high-dimensional stock market data with additional text information provided by Bloomberg.