The Comparison Between VAR and ARIMAX Time Series Models in Forecasting

Abstract


Introduction
Time series is a statistical and data analysis technique used to study and analyze data points collected or recorded over time.It involves the analysis of data points ordered chronologically, typically at uniform time intervals, such as daily, monthly, or yearly.Time series data can be found in various fields, including economics, finance, environmental science, engineering, and many others.Overall, time series analysis is a powerful tool for understanding and making predictions based on temporal data, allowing businesses and researchers to make informed decisions and respond to changing trends and patterns over time.
VAR, or Vector Autoregression, is a statistical modelling technique commonly used in time series analysis, especially in the context of multivariate time series data.It is an extension of the autoregressive (AR) model, which focuses on the relationship between a single time series and its past values.In VAR models, you analyze the relationships between multiple time series variables and their past values simultaneously.VAR models are beneficial when you want to capture the dynamic interactions between multiple variables over time.They are commonly used in macroeconomics, finance, and other fields where understanding the joint behaviour of multiple time series is crucial for decision-making and policy analysis [22].ARIMAX, which stands for Autoregressive Integrated Moving Average with exogenous inputs, is a time series analysis and forecasting model that extends the traditional ARIMA (autoregressive Integrated Moving Average) model by incorporating exogenous or external predictor variables.ARIMAX is used for modelling and forecasting time series data when the behaviour of the series is influenced by not only its past values but also by external factors.ARIMAX modelling is a powerful tool when you need to account for the impact of external factors or exogenous variables in your time series analysis and forecasting.It allows for more accurate predictions by considering not only the series' history but also the influence of other relevant variables [1].
Forecasting in time series analysis is the process of making predictions or estimates about future data points based on historical observations.Time series forecasting is a critical component in various fields, including finance, economics, meteorology, and operations management, among others.To perform accurate forecasting, several methods and techniques are available, depending on the characteristics of the time series data.The choice of forecasting method depends on the nature of the time series data, including its stationarity, seasonality, and the presence of exogenous variables.It's often recommended to experiment with different methods and evaluate their performance to select the most appropriate approach for a given dataset.Additionally, time series forecasting often requires continuous monitoring and reevaluation as new data becomes available, allowing for model updates and improvements [4].
The Iraqi budget, like the budget of any other country, is a financial plan that outlines the government's expected revenues, expenditures, and allocations for a fiscal year.Here, the time series of foreign reserves and government spending will be analyzed through VAR and ARIMAX models, and then compared between them to obtain the best model used in forecasting.

Theoretical Aspect
The theoretical aspect presented some basic concepts on the subject of research from the statistical side, as shown in the following paragraphs.

Time Series
Time series forecasting is widely used across diverse fields, including statistics, inventory management, and economics.Various forecasting models are available, ranging from simpler techniques like moving averages and linear regression to more complex approaches like ARIMA and neural networks [21].Time series analysis is a statistical method used to analyze and model data that is collected and ordered over time.The information you provided offers a fundamental understanding of time series and its components [16].Time Series Definition: A time series is an ordered sequence of observations.These observations are usually taken at equally spaced intervals over time.

The objective of Time Series Modeling:
The primary objective of time series modelling is to study and analyze historical data from a time series to develop appropriate models that can be used to make predictions or forecasts for future values of the series [17].Time series analysis helps uncover patterns, trends, and relationships within the data.
Components of Time Series: Time series data can be decomposed into four main components, which are essential for understanding its underlying structure:

Trend (T):
The trend component represents the long-term direction or movement in the data.It captures gradual changes or trends in the data over time, such as an increasing or decreasing pattern.

Periodic (C):
The periodic component accounts for regular, repeating patterns in the data, which may not necessarily follow a linear trend.These patterns can have different time intervals [1].

Seasonal (S):
Seasonal patterns are similar to periodic patterns but occur at fixed and known intervals, such as daily, monthly, or yearly.Seasonal effects are often associated with calendar-related events, like holidays or weather.

Irregular (I):
The irregular component, also referred to as noise or residual, represents the unexplained or random fluctuations in the data that cannot be attributed to the other components.It includes any random variations, outliers, or noise in the time series.

Stationary Time Series
Stationarity holds a central position in time series, as it acts as a fundamental prerequisite for various statistical and mathematical techniques.The core principles of stationarity entail maintaining consistent statistical characteristics over time, which encompass an unchanging mean (indicating the absence of a trend), a uniform variance, and a stable autocorrelation structure [19].
In practical terms, numerous time series datasets exhibit non-stationary behaviour, wherein their statistical properties, such as means, variances, or trends, evolve [20].This non-stationary nature can pose challenges when applying traditional time series analysis methods, which often assume stationarity [18].
To address this challenge, non-stationary time series data can be converted into a stationary format through two primary methods: Differencing: This technique involves computing the differences between sequential data points to eliminate any trend and achieve a constant mean.
Transformation: In this approach, mathematical operations, such as taking the natural logarithm or square root, are used to stabilize the variance [2].
The primary objective in ensuring stationarity is to simplify the analysis of time series data.Several statistical models, including ARIMA (Autoregressive Integrated Moving Average), rely on the assumption of stationarity.By transforming a time series into a stationary one, it becomes possible to obtain more accurate forecasts and model estimates [13].

VAR Model
A VAR (Vector Autoregression) model is a type of statistical model used in time series analysis and econometrics.It's a multivariate time series model that describes the relationship between multiple time series variables [14].In a VAR model, each variable is modelled as a linear combination of its past values and the past values of all other variables in the system.For example, the system of equations for a VAR (1) model with two-time series (variables `y1` and `y2`) is as follows: Key features of a VAR model include: Unlike univariate time series models, which analyze a single variable in isolation, VAR models consider multiple variables simultaneously.This is particularly useful when you want to capture and model the dynamic interactions and feedback loops among different variables [22].
A VAR model is typically specified with an order, denoted as VAR(p), where "p" represents the number of lagged time points considered in the model.The choice of order is an important part of VAR modelling and affects the complexity of the model.Just like with univariate time series models, stationarity is important for VAR models.In a VAR(p) model, the variables should be stationary after differencing at least p times.
Estimating the parameters of a VAR model is usually done through techniques like the method of least squares or maximum likelihood estimation.VAR models are often used for forecasting and assessing the impact of shocks on the variables within the system.You can calculate impulse response functions to see how a shock to one variable affects the others over time [15].
VAR models can be extended to VARMA (Vector Autoregressive Moving Average) models to account for moving average components, and they are also a component of the more comprehensive VARMAX models, which incorporate exogenous variables.These models can be useful for various tasks, such as economic forecasting, policy analysis, and risk assessment in financial markets.

ARIMAX Model
ARIMAX is a time series forecasting model that combines ARIMA principles with exogenous variables.It extends ARIMA by including external predictors (denoted as X) to improve forecasting accuracy.This model involves specifying AR, I, and MA components, along with the exogenous variables, estimating model parameters, and making forecasts [3].It's commonly used in economics, finance, and environmental science to account for external influences on time series data.The ARMAX(p,q,r) model equation There 1, , 2, , , , represents exogenous variables, 1 , 2 , , their coefficients [12], and r is the number of exogenous variables.The provided information describes the ARIMAX and ARMAX models, which are commonly used in time series analysis and forecasting when exogenous variables are involved.
AR Terms: The ϕ 1 , ϕ 2 , …, ϕ p terms are autoregressive terms, indicating the relationship between the dependent variable and its past values.MA Terms: The et, θ 1 e(t−1), θ 2 e(t−2), …, θ q e(t−q) terms are moving average terms, which account for the influence of past errors in the model.
Application of ARIMAX: ARIMAX models are applied in various fields, including economics, agriculture, and engineering, to improve predictive performance compared to the basic ARIMA model.The d parameter in the ARIMAX model indicates the number of times differencing is applied to the time series to make it stationary.This is often necessary when dealing with non-stationary data [4].

Efficiency criteria for estimated models
AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are two commonly used statistical measures for model selection and evaluation in the context of regression models, including linear regression and time series models like ARIMA and VAR.They are used to assess the trade-off between model complexity and goodness of fit.Lower values of AIC and BIC indicate better-fitting and more parsimonious models [10].Here's an explanation of each: The Akaike Information Criterion (AIC) is a statistical measure used for model selection and evaluation.AIC is commonly used in the context of various statistical models, including linear regression, time series models, and more [9].Its primary purpose is to assess the trade-off between the goodness of fit of a model and its complexity.
The AIC value for a given model is calculated using the following formula [5]: Here's what each component of the formula represents: 1. -2 log-likelihood: This term measures how well the model fits the observed data.The log-likelihood is a measure of how probable the observed data is under the model [8].The negative sign indicates that you're trying to maximize the likelihood (i.e., find the model that best fits the data).
2. 2 k: This term represents a penalty for model complexity."k" is the number of parameters in the model.The penalty term encourages the selection of simpler models, as adding parameters will increase AIC [11].
Model Selection: AIC is used to compare different models.When comparing models, you typically choose the model with the lowest AIC value because it represents the best trade-off between goodness of fit and model complexity [7].
Bayesian Information Criterion (BIC): It is a statistical criterion used for model selection among a finite set of models.BIC balances the goodness of fit of the model with the number of parameters, penalizing models with more parameters to avoid overfitting [6].
To compare two different models, the mean square error (MSE) is used [23]:

Application Aspect
The practical aspect dealt with the two correlated time series that represent two variables, namely foreign reserves (x 1 ) and government spending (x 2 ) in the general budget of Iraq (in the appendix).There is a general trend in the time series of foreign reserves and government spending, as shown in the following Figure 1.
The linear correlation coefficient between foreign reserves and government spending amounted to 91.4%, which is positive and significant because the p-value is equal to zero, which is less than the significance level (0.05). Figure 2 explains the cross-correlation between foreign reserves and government spending.
Special tests are used to ascertain whether the stationarity of the time series exists or not, and one of these tests is the Augmented Dickey-Fuller Test (ADF) to indicate whether the time series is stable around an average or linear trend or it is not stable due to the unit root, which tests the following hypothesis: Null Hypothesis: The data contains a unit root Alternative Hypothesis: The data no contains a unit root   1 that the results of the ADF test indicate that the time series of Foreign Reserves and Government Spending is stationary at the first difference since the value of the absolute test statistic is (2.4182, 2.3499), which is greater than the absolute critical value (1.956) and p-value (0.019, 0.022) is less than significant level (0.05). Figure 3, and 4 explains the sample Autocorrelation and sample Partial Autocorrelation Function for the variable (x 1 ) and (x 2 ), respectively.As mentioned in the theoretical aspect, the VAR models represent an extended version of the AR model that includes two variables that allow taking advantage of the cross-correlation that may be present in the VAR model.Now we will find a suitable model to predict the x 1 and x 2 together.The AR-stationary 3dimensional VAR {3) model with linear time trend (VAR) was chosen depending on the lowest values of the criteria AIC and BIC.The best model is selected by comparing the three estimated models based on criteria AIC and BIC as shown in Table 2 Table 2 shows that the VAR {3) model was the best because the values of criterion AIC and BIC are less than their value in the other models, so the following third model was relied upon: Figure 5 shows the model fit for both variables.Figure 6 shows that the residuals fluctuate around the zero line.Also, all values of the autocorrelation coefficients fall within the confidence interval for both variables.Furthermore, most of the residual values fall within the standard curve.After passing all the necessary tests, the VAR {3} model is ready to forecast future values.
The best model estimated above VAR {3} with MSE equal to (114.5) was used to forecast the Foreign Reserves and Government Spending for Iraq for the four years (2021-2024), and are summarized in Table 4:  Table 4 shows that there is an expected increase in the coming years in Foreign Reserves and Government Spending, with the balance of Government Spending remaining higher than Foreign Reserves, which constitutes the continuation of the general deficit in the budget of Iraq in the coming years, as shown in Figure 7.As mentioned in the theoretical aspect, the ARIMAX models represent an extended version of the ARIMA model that includes other independent (predictive) exogenous variables.ARIMAX models are similar to multiple regression models except that allow taking advantage of the autocorrelation that may be present in the regression residuals.Now we will find a suitable model to predict the y with the presence of an external variable x.Out of a total of (32) possible models, the ARIMAX (1,1,0), (2,1,0), and (3,1,0) models were chosen depending on the significance of the estimated model parameters and the lowest values of the criteria AIC and BIC as shown in Table 5: Table 5 shows that the ARIMAX (3,1,0) model was the best because the values of criterion AIC and BIC are less than their value in the other models, so the following third model was relied upon: (1  1  2 2  3 3 )(1  )  = + +      8 shows the model fit.Table 7 shows that there is an expected increase in the coming years in government spending, with the balance of government spending remaining higher than foreign reserves, which constitutes the continuation of the general deficit in the budget of Iraq in the coming years, as shown in Figure 10.

Figure 1 .Figure 2 .
Figure 1.Time series for the foreign reserves and government spending

Figure 3 .
Figure 3. ACF, and PACF for the series Foreign Reserves

Figure 4 .
Figure 4. ACF, and PACF for the series Government Spending 3.1.VAR Models for time series data:

Figure 7 .
Figure 7.The time series data with forecasting 3.2.ARIMAX Models for time series data:

Table 2 . VAR models efficiency criteria
:

Table 3
shows the estimation results for the VAR {3} model:

Table 3
clearly shows the statistical significance of some estimated parameters, the trend 1 parameter, which supports the strength of this model.Therefore, the model is

Table 6
clearly shows the statistical significance of the estimated parameters and the regression parameter, Figure