Abstract
Predicting future behavior is one of the important topics in statistical sciences due to the need for it in different areas of life, and most countries rely on their development programs on advanced scientific foundations and methods in order to reach more effective results. This research deals with a comparison of the accuracy of time series prediction using state space models and the matching patterns method of Singh (2001) algorithm by applying to real data, which are economic observations that were previously addressed by the researchers Box and Jenkins (1976). Where the inputs represent the leading indicator and the outputs represent sales, and the importance of this research is represented in Knowing the most accurate method for predicting time series. The MATLAB program has been used to access the results of the research. The most important results of the research are that the state space model is more accurate in forecasting than the matching patterns in the studied data because it has the lowest values of the test criteria of prediction accuracy results.
Keywords
Main Subjects
Highlights
The research concluded some conclusions, including:
It was found through practical application that the prediction of linear dynamic models, which are models of the state space, gives better predictive values than matching patterns.
The mean square error, forecast error, and test criteria for the accuracy of predictive results of linear dynamic models represented by state space models were less than in matching patterns, Which indicates the superiority of the dynamic models over the symmetric patterns and predict the future values of the case study.
The same study can be used for sophisticated matching patterns algorithms and compare their predictive values with dynamic models to see which gives better results in prediction
Full Text
1. Introduction
The philosophy of statistics lies in terms of the application mechanism to try to model different phenomena with models that are as close as possible to the actual reality. These models measure the degree of their strength according to the degree of their affinity with the statistical inferential properties. And that these models are on different forms and types, some of which are probabilistic, and that their formulation depends on pure probabilities (as in time series models).
Forecasting is one of the most important pillars in support of different planning processes. As it is not possible to complete any planning work if it is not based on scientific forecasts based on methodological methods. Therefore, the enterprises resort to choosing the most appropriate methods of forecasting from among these abundant quantities of these methods based on the extent of their needs and capabilities. Time series method is one of the most important methodology of high power which used prediction applications. The Box-Jenkins method is one of the best methodological methods for time series analysis.
The aim of the research is to procedure comparison between prediction with the best model of state space models and prediction in the manner of symmetric patterns of the algorithm Singh) 2001) according to the criteria of the prediction accuracy test )MSE, MAPE, MAE( through the application on real data
2. Local approximation using Pattern Modeling and Recognition System (PMRS)
Local approximation patterns can be used to predict future series time behavior [Singh, 2001]. When using this technique, the time series can be represented as a vector in the following way:
Therefore, if the current value of the time series when [k=1] it can be represented by the last value in the time series Y_i, which is represented by Y_n, and one of the simple methods of forecasting can be adopted to diagnose the neighbor closest to Y_n the past values. For example, it can be said that Y_j its prediction Y_(n+1) depends on the value Y_(j+1), and it can be expanding the current value of the time series Y_n to include more than one value, for example: when [k=2] Sc will define the current values as Sc={Y_(n-1),Y_n } and represent the last two values of the time seriesY_i, and changing the direction of the current combination is called Pattern.
Therefore, the current values of the series prediction depend on the past values Sp={Y_(j-1),Y_j } and the value of the next series Y_P^+ that were given before Y_(j+1), provided that we prove that the values {Y_(j-1),Y_j } are the closest neighbor to the values {Y_(n-1),Y_n } and when some prediction measures are used, the states will be referred to as patterns. In theory, the current values can be used for any volume but in practice, current values can be represented only for the optimum size of past values of the same volume which gives a more accurate prediction whether the nearest neighbor is small or large. The prediction procedures can be illustrated according to the fuzzy symmetric pattern method through: [Singh, 2001]
Y ̂=∅(Sc,Sp,Y_P^+,K,C) (2 )
As: Y ̂ represent predicting a step forward, Sc represent the current values, Sp represent past values, Y_P^+ represent the series value that results from the past case Sp, K represent pattern size and C represent intended to find an identical matching to the original matching.
To illustrate the matching process for predicting the time series of the future, we assume that the time series is represented as a vector since (n) the total number of observations in the series. In most cases, the series is represented as a function of time, that is , the series can be defined S={S_1,S_2,…,S_(n-1) } as the difference vector representing the transition from The case (n) to the case (n+1). As:
The nearest neighbor is defined mathematically, the series is encoded Y as a vector with respect to the change in direction.
For this purpose, Y_i it is encoded as 0 if Y_(i+1 )
Therefore, the complete time series is encoded as b_i binary values are either zero or one except in some cases where the series value does not change. We will denote it by (2). The patterns differ according to their size, but most of the time they are (2 ≤ k ≤ 5) and that the number of possible shapes at the font size (k) is (2k+1), knowing that there are many matching to one pattern. Patterns with small sizes will have simple and distinct shapes while patterns with large sizes will have complex shapes[Singh,2000].
3. Algorithm of Fuzzy Pattern Matching for (Singh, 2001)
This algorithm was suggested by the researcher Singh, and it is present in a lot of research McAtckney & Singh (1998) , Singh( 1999) & Singh( 2001) and its steps are as follows:
We start by choosing the pattern that has the lowest magnitude, that is k=2 taken from the last two values of series .
We look at the coded time series to find the closest matching for . Assume the closest matching is to group corresponds to respectively, j represents the location of the matching. Then we use the following equation: -
As: j represent the matching site, S represent the view difference value, k represent the size of the pattern and W_i equal to one for all sizes.
Calculating the expectation value as it depends on Eq.(5). We take the lowest value of what is the value of (j) corresponds to less . if it is ( ) then the expectation for the value ( ) is higher and that:
If it is (b_j=0) then the expectation value is for (Y_(n+1)):
And if it is ( ) then the expectation value is for ( ):
As:
As: represent the last value of the original series. represent the value to be predicted. represent the difference in viewing value (j+1), i.e. . represents the encoded value of the watch ( j). n represent sample size. k represent size pattern, and i represents a counter for values k starting from 2 to 5.
When making steps 1 to 3 with this, we have predicted one step forward when (k = 2), and so we repeat the steps until the prediction values are completed when (k = 2).
Resize k and repeat steps 1 to 4 and after applying all the sizes for k we will use error measures to find the best size whose prediction values are close to the original values.
4. State-Space Models
The state space is a special mathematical approach to representing the different dynamic systems based mainly on the notion of the Markovian property. The representation of the state space is a mathematical model to represent a physical system as a set of inputs and outputs by means of a pair of differential differentiation equations of the first order, the first describing the input vector at time (t+1) and symbolizing it with a significance as well as the input and called the state equation, while the second describes the output significance of both the inputs and the inputs , this equation is called the observation equation. These two equations can be clarified in the case of a single input single output (SISO), as follows:[ AlKayat, 2011][ Ibrahim & Hayawi, 2021]
As: A represents the independent dynamics of the system with the dimension (n × n). B represents the effect of control verbs in the dimension (n × n). C represents the projection to the observed variables in the dimension (n×1). and D represents real value.
These two equations play an important role in the study of dynamic systems, as the inputs and outputs are expressed through a differential equation when there is a specific system in the intermittent time and there are usually uncontrolled disturbances deal as random variables or disturbances affecting the outputs[Guerbyenne & Hamdi, 2014], [Nelles, 2001]. The case space models give effective basics in time series analysis within a wide range in many fields including engineering and economic matrices and among many researchers among them [Box & Jenkines et al., 2016] and there are several reasons for using case space models because they give a sophisticated set of recursive equations that are used to find Prediction, this method is called the Kalman Filter, which helps to facilitate the calculation process to find the prediction error [Kalman, 1960]. The case space models are also known as the internal model because they are unique from other models by being in parallel with the variables that can be measured, and which cannot be measured, they are included in the model, for details[Box & Jenkines et al., 2016] [Hayawi, 2022].
5.Testing the accuracy of predictive results
Accuracy is often called the word goodness of fit, which refers to how to make the prediction model able to generate data with efficiency, and there are several criteria by which to adjust the accuracy of the model in prediction, including: [ Tohme , 2012]
1- The Mean Square Error: is defined by the following formula:
Since: represents the true values of the series, represents the predicted values of the series, and n represents the prediction period.
2- The Mean Absolute Error: can be found by the following formula:
3- Mean Absolute Percentage Error: is calculated as follows:
6.The practical side
In this research, real data was used, and it is economic observations that were previously addressed by researchers Box and Jenkins (1976), which is Ut, which represents a leading indicator. and the outputs Yt represents sales and includes (150) pairs of inputs and outputs[Box & Jenkines, 1996]. The input and output data can be represented in the following two forms:
We notice from Fig. (1) and Fig. (2) that the data are unstable. Box and Jenkins (1976) researchers processed the first difference of the two series. 145 views were taken from the data and the rest was used for prediction.
A case space model was found for the data with different parameters and selecting the best model that has the lowest value for the statistical criteria as shown in table 1.
Table 1. State space models of various ranks with criteria
FPE |
Loss fun. |
AIC |
Standards Ranks |
1.70185 |
1.3924 |
0.5310 |
1 |
1.33557 |
0.890379 |
0.2839 |
2 |
0.633337 |
0.341028 |
0.4758- |
3 |
0.16858 |
0.13246 |
0.9273- |
4 |
2.4059 |
0.481181 |
0.6018 |
5 |
We notice from Table (1) above that the best model of the fourth-level case space has the lowest AIC, Loss function and FPE standard and is formulated as follows:
, ,
The value of D = 0, and using the MATLAB system, the best case space model was represented by the ARMAX model formula. To ensure the accuracy of the results obtained in terms of the accuracy of the state space model, a random drawing of the remaining series can be observed, as shown in the following figure:
Figure 3. The random drawing the residual series of the statespace mode |
After that, the state space model was predicted and as shown in Table 2. as follows:
Table 2. The prediction values of the state space model.
Forecast values |
Original values |
series |
263.7 |
263.3 |
146 |
262.3 |
262.8 |
147 |
262.8 |
261.8 |
148 |
262.6 |
262.2 |
149 |
262.2 |
262.7 |
150 |
After that, the application was made using the matching patterns method on the research data where the patterns vector was found using the Eq.(4) and the differences and the series of patterns were found for the data as shown in Table (3) the following:
Table 3. The original data with finding the series of differences and the series of patterns.
vector of differences |
Patterns Vector |
Yi |
T |
vector of differences |
Patterns Vector |
Yi |
T |
-0.9 |
0 |
209.7 |
74 |
-0.6 |
0 |
200.1 |
1 |
0.0 |
2 |
208.8 |
75 |
-0.1 |
0 |
199.5 |
2 |
0.0 |
2 |
208.8 |
76 |
-0.5 |
0 |
199.4 |
3 |
1.8 |
1 |
208.8 |
77 |
0.1 |
1 |
198.9 |
4 |
1.3 |
1 |
210.6 |
78 |
1.2 |
1 |
199 |
5 |
0.9 |
1 |
211.9 |
79 |
-1.6 |
0 |
200.2 |
6 |
-0.3 |
0 |
212.8 |
80 |
1.4 |
1 |
198.6 |
7 |
2.3 |
1 |
212.5 |
81 |
0.3 |
1 |
200 |
8 |
0.5 |
1 |
214.8 |
82 |
0.9 |
1 |
200.3 |
9 |
2.2 |
1 |
215.3 |
83 |
0.4 |
1 |
201.2 |
10 |
1.3 |
1 |
217.5 |
84 |
-0.1 |
0 |
201.6 |
11 |
1.9 |
1 |
218.8 |
85 |
0.0 |
2 |
201.5 |
12 |
1.5 |
1 |
220.7 |
86 |
2.0 |
1 |
201.5 |
13 |
4.5 |
1 |
222.2 |
87 |
1.4 |
1 |
203.5 |
14 |
1.7 |
1 |
226.7 |
88 |
2.2 |
1 |
204.9 |
15 |
4.8 |
1 |
228.4 |
89 |
3.4 |
1 |
207.1 |
16 |
2.5 |
1 |
233.2 |
90 |
0.0 |
2 |
210.5 |
17 |
1.4 |
1 |
235.7 |
91 |
-0.7 |
0 |
210.5 |
18 |
3.5 |
1 |
237.1 |
92 |
-0.1 |
0 |
209.8 |
19 |
3.2 |
1 |
240.6 |
93 |
0.7 |
1 |
208.8 |
20 |
1.5 |
1 |
243.8 |
94 |
3.7 |
1 |
209.5 |
21 |
0.7 |
1 |
245.3 |
95 |
0.5 |
1 |
213.2 |
22 |
0.3 |
1 |
246 |
96 |
1.4 |
1 |
213.7 |
23 |
1.4 |
1 |
246.3 |
97 |
3.6 |
1 |
215.1 |
24 |
-0.1 |
0 |
247.7 |
98 |
1.1 |
1 |
218.7 |
25 |
0.2 |
1 |
247.6 |
99 |
0.7 |
1 |
219.8 |
26 |
1.6 |
1 |
247.8 |
100 |
3.3 |
1 |
220.5 |
27 |
-0.4 |
0 |
249.4 |
101 |
-1.0 |
0 |
223.8 |
28 |
0.9 |
1 |
249 |
102 |
1.0 |
1 |
222.8 |
29 |
0.6 |
1 |
249.9 |
103 |
-2.1 |
0 |
223.8 |
30 |
1.0 |
1 |
250.5 |
104 |
0.6 |
1 |
221.7 |
31 |
-2.5 |
0 |
251.5 |
105 |
-1.5 |
0 |
222.3 |
32 |
-1.4 |
0 |
249 |
106 |
-1.4 |
0 |
220.8 |
33 |
1.2 |
1 |
247.6 |
107 |
0.7 |
1 |
219.4 |
34 |
1.6 |
1 |
248.8 |
108 |
0.5 |
1 |
220.1 |
35 |
0.3 |
1 |
250.4 |
109 |
-1.7 |
0 |
220.6 |
36 |
2.3 |
1 |
250.7 |
110 |
-1.1 |
0 |
218.9 |
37 |
0.7 |
1 |
253 |
111 |
-0.1 |
0 |
217.8 |
38 |
1.3 |
1 |
253.7 |
112 |
-2.7 |
0 |
217.7 |
39 |
-38.8 |
0 |
255 |
113 |
0.3 |
1 |
215.0 |
40 |
39.8 |
1 |
216.2 |
114 |
0.6 |
1 |
215.3 |
41 |
1.4 |
1 |
256 |
115 |
0.8 |
1 |
215.9 |
42 |
3.0 |
1 |
257.4 |
116 |
0.0 |
2 |
216.7 |
43 |
-0.4 |
0 |
260.4 |
117 |
1.0 |
1 |
216.7 |
44 |
1.3 |
1 |
260 |
118 |
1.0 |
1 |
217.7 |
45 |
-0.9 |
0 |
261.3 |
119 |
4.2 |
1 |
218.7 |
46 |
1.2 |
1 |
260.4 |
120 |
2.0 |
1 |
222.9 |
47 |
-0.8 |
0 |
261.6 |
121 |
-2.7 |
0 |
224.9 |
48 |
-1.0 |
0 |
260.8 |
122 |
-1.5 |
0 |
222.2 |
49 |
-0.8 |
0 |
259.8 |
123 |
-0.7 |
0 |
220.7 |
50 |
-0.1 |
0 |
259 |
124 |
-1.3 |
0 |
220 |
51 |
-1.5 |
0 |
258.9 |
125 |
-1.7 |
0 |
218.7 |
52 |
0.3 |
1 |
257.4 |
126 |
-1.1 |
0 |
217 |
53 |
0.2 |
1 |
257.7 |
127 |
-0.1 |
0 |
215.9 |
54 |
-0.5 |
0 |
257.9 |
128 |
-1.7 |
0 |
215.8 |
55 |
-0.1 |
0 |
257.4 |
129 |
-1.8 |
0 |
214.1 |
56 |
-9.7 |
0 |
257.3 |
130 |
1.6 |
1 |
212.3 |
57 |
11.3 |
1 |
247.6 |
131 |
0.7 |
1 |
213.9 |
58 |
-1.1 |
0 |
258.9 |
132 |
-1.0 |
0 |
214.6 |
59 |
-0.1 |
0 |
257.8 |
133 |
-1.5 |
0 |
213.6 |
60 |
-0.5 |
0 |
257.7 |
134 |
-0.7 |
0 |
212.1 |
61 |
0.3 |
1 |
257.2 |
135 |
1.7 |
1 |
211.4 |
62 |
-0.7 |
0 |
257.5 |
136 |
-0.2 |
0 |
213.1 |
63 |
0.7 |
1 |
256.8 |
137 |
0.4 |
1 |
212.9 |
64 |
-0.5 |
0 |
257.5 |
138 |
-1.8 |
0 |
213.3 |
65 |
0.6 |
1 |
257 |
139 |
0.8 |
1 |
211.5 |
66 |
-0.3 |
0 |
257.6 |
140 |
0.7 |
1 |
212.3 |
67 |
0.2 |
1 |
257.3 |
141 |
-2.0 |
0 |
213 |
68 |
2.1 |
1 |
257.5 |
142 |
-0.3 |
0 |
211 |
69 |
1.5 |
1 |
259.6 |
143 |
-0.6 |
0 |
210.7 |
70 |
1.8 |
1 |
261.1 |
144 |
1.3 |
1 |
210.1 |
71 |
* |
* |
262.9 |
145 |
-1.4 |
0 |
211.4 |
72 |
|
|
|
|
-0.3 |
0 |
210 |
73 |
Using Singh (2001) algorithm which depend on finding the matchings of a pattern (k = 2) and the better matching gives the lowest value of the equation , and so we predict the rest of the series values, and repeats the same steps for the rest of the sizes (k = 3,4,5) so the results of the prediction using fuzzing matching patterns using this algorithm is shown in Table (4) the following:
Table 4. The forecast values of matching patterns
Forecast values |
Original values |
series |
255.66 |
263.3 |
146 |
217.45 |
262.8 |
147 |
227.24 |
261.8 |
148 |
228.52 |
262.2 |
149 |
228.52 |
262.7 |
150 |
By comparing the predictive values of matching patterns with the status space through the criteria used for each of them, it was noted that the best predictive values can be obtained through the status space and according to the criteria for choosing the prediction accuracy as shown in Table (5) the following:
Table 5. The values of the models quality test criteria in forecasting.
MAPE |
MAE |
MSE |
models |
86.4987 |
1.24900 |
2.25001 |
matching patterns |
122.657 |
0.608259 |
0.100519 |
status space |
7. Conclusions
The research concluded some conclusions, including:
It was found through practical application that the prediction of linear dynamic models, which are models of the state space, gives better predictive values than matching patterns.
The mean square error, forecast error, and test criteria for the accuracy of predictive results of linear dynamic models represented by state space models were less than in matching patterns, Which indicates the superiority of the dynamic models over the symmetric patterns and predict the future values of the case study.
The same study can be used for sophisticated matching patterns algorithms and compare their predictive values with dynamic models to see which gives better results in prediction