Our GRATIS paper for GeneRAting TIme Series with diverse and controllable characteristics is accepted in the ASA data science journal: Statistical Analysis and Data Mining.
Yanfei Kang, Rob J Hyndman, and Feng Li*. (2020). GRATIS: GeneRAting TIme Series with diverse and controllable characteristics, Statistical Analysis and Data Mining. (In Press)
[Journal version | Working Paper | R Package | Web App]
The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires either collecting or simulating a diverse set of time series benchmarking data to enable reliable comparisons against alternative approaches. We propose GeneRAting TIme Series with diverse and controllable characteristics, named GRATIS, with the use of mixture autoregressive (MAR) models. We simulate sets of time series using MAR models and investigate the diversity and coverage of the generated time series in a time series feature space. By tuning the parameters of the MAR models, GRATIS is also able to efficiently generate new time series with controllable features. In general, as a costless surrogate to the traditional data collection approach, GRATIS can be used as an evaluation tool for tasks such as time series forecasting and classification. We illustrate the usefulness of our time series generation process through a time series forecasting application.
We have proposed an efficient simulation method, GRATIS, for generating time series with diverse characteristics requiring minimal input of human effort and computational resources. Our generated dataset can be used as benchmarking data in the time series domain, which functions similarly to other machine learning data repositories. The simulation method is based on mixture autoregressive models where the parameters are assigned with statistical distributions. In such a way, we provide a general benchmarking tool serving for advanced time series analysis where a large collection of benchmarking data is required, including forecasting comparison, model averaging, and time series model training with self-generated data. To the best of our knowledge, this is the first paper that thoroughly studies the possibility of generating a rich collection of time series. Our method not only generates realistic time series data but also gives a higher coverage of the feature space than existing time series benchmarking data.
The GRATIS approach is also able to efficiently generate new time series with controllable target features, by tuning the parameters of MAR models. This is particularly useful in time series classification or specific areas where only some features are of interest. This procedure is the inverse of feature extraction which usually requires much computational power. Our approach of generating new time series from given features can scale up the computation time by 40 times making feature-driven time series analysis tasks feasible.
We further show that the GRATIS scheme can serve as a useful resource for time series applications. In particular, we present a novel time series forecasting approach by exploiting the time series features of current generated time series. Our application also sheds light on a potential direction to forecasting with private data where the model training could be purely based on our generated data. The reader should take home the message that simulated series are similar to the original series in terms of features, but this does not mean they visually look alike.
Other potential extensions include: (i) GRATIS with exogenous information via mixture of ARIMA with explanatory variables (ARIMAX) to allow for local patterns due to external events, (ii) GRATIS with multivariate time series by exploring mixtures of vector autoregression models, (iii) GRATIS with cross-sectional information about the time series by exploring the approaches, (iv) extending GRATIS to discrete time series by investigating the mixture of integer-valued autoregressive processes or Poisson autoregression, and (v) using GRATIS to serve as a pre-training process of deep learning methods to save time and improve accuracy.
Leave a Reply