Prediction of environmental missing data time series by Support Vector Machine Regression and Correlation Dimension estimation

Authors

F. Camastra, V. Capone, A. Ciaramella, A. Riccio and A. Staiano

Journal

Environmental Modelling & Software (Elsevier)

Abstract

Environmental time series are often affected by missing data, namely data unavailability at certain time points. This paper presents the Iterated Imputation and Prediction algorithm, that allows the prediction of time series with missing data. The algorithm uses iteratively the Correlation Dimension Estimation of the underlying dynamic system generating the time series to fix the model order (i.e., how many past samples are required to model the time series accurately), and the Support Vector Machine Regression to estimate the skeleton of time series. Experimental validation of the algorithm on three environmental time series with missing data, expressing the concentration of Ozone in three European sites, shows a small average percentage prediction error for all time series on the test set.
 

Description

The article “Prediction of environmental missing data time series by Support Vector Machine Regression and Correlation Dimension estimation” presents a novel methodology for the forecasting of environmental time series with missing data while also reconstructing missing data values.

The proposed Iterated Imputation and Prediction method combines Correlation Dimension Estimation to identify the model order (i.e., how many past samples are required to model the time series accurately) of complex time series with Support Vector Machine Regression to estimate the underlying temporal dynamics and to accurately predict missing values.

The approach has been validated on ozone concentration measurements from three European monitoring sites, where it achieved low prediction errors despite substantial data gaps, indicating its suitability for enhancing the reliability of environmental time series used in scientific analyses and environmental monitoring.