Cross-Covariance Weight of GSTAR-SUR Model for Rainfall Forecasting in Agricultural Areas

The use of location weights on the formation of the spatio-temporal model contributes to the accuracy of the model formed. The location weights that are often used include uniform location weight, inverse distance, and cross-correlation normalization. The weight of the location considers the proximity between locations. For data that has a high level of variability, the use of the location weights mentioned above is less relevant. This research was conducted with the aim of obtaining a weighting method that is more suitable for data with high variability. This research was conducted using secondary data derived from 10 daily rainfall data obtained from BMKG Karangploso. The data period used was January 2008 to December 2018. The points of the rain posts studied included the rain post of the Blimbing, Karangploso, Singosari, Dau, and Wagir regions. Based on the results of the research forecasting model obtained is the GSTAR ((1), 1,2,3,12,36) -SUR model. The cross-covariance model produces a better level of accuracy in terms of lower RMSE values and higher R2 values, especially for Karangploso, Dau, and Wagir areas.


INTRODUCTION
There are several spatio-temporal models that have been developed. For the first time, Space-Time Autoregressive (STAR) model was introduced by Pfeifer & Deutsch [1], [2]. The Space-Time Autoregressive (STAR) model had the assumption that the variance between locations is the same/homogeneous. However, in fact, it often gets heterogeneity between observation sites. Thus the STAR model is not suitable for data that has heterogeneous location characteristics. This is the weakness of the STAR model and this weakness can be handled by the Generalized Space-Time Autoregressive (GSTAR) and GSTAR-OLS models developed by Borovkova, Lopuha, & Ruchjana [3] and Ruchjana [4], [5]. The GSTAR model developed is used for data that meets stationary assumptions. The latest development of this spatio-temporal model is the GSTAR-SUR model developed by Iriany [6] to overcome data that is not stationary and has a seasonal pattern. Furthermore, the use of the GSTAR-SUR-NN hybrid model was also developed for data that has a nonlinear pattern [7]- [9] The use of location weights in the formation of spatiotemporal models also contributes to the level of accuracy of the model formed. There are some types of location weights that are used to build models, there are uniform location weight, inverse distance, and cross-correlation normalization [10], [11]. The weight of the location considers the proximity between locations. For data that has a high level of variability, the use of the location weights mentioned above is less relevant. Therefore, we need location weights that consider the various aspects of observational data. One of the location weights that have been developed is the weight of the variance ratio which has proven to have a better level of accuracy [12]. The weight of other locations developed is the weight of crosscovariance. The use of the weight of cross-covariance has been researched and applied in the research of Apanosovich and Genton [13] to predict pollution in California and the research of Efromovich & Smirnova [14] for fMRI imaging processes with a wavelet approach. This research was conducted to determine the accuracy of the GSTAR model that was built using the weight of cross-covariance and compare the level of accuracy with the GSTAR model that was built with the weight of cross-correlation.

METHODS
The data period used is January 2008 to December 2018, where data for conducting a training (in-sample) is data from January 2008 to December 2017. While data from January 2018 to December 2018 is used as testing data (out-of-sample). The first step taken is testing the stationary data on rainfall. The stationary test on the average is done using the Augmented Dickey-Fuller test. While the stationarity test for the variance was carried out by the Box-Cox test. The next step is to identify the real MACF and MPACF lags to determine the order that will be used as an estimate of the GSTAR model. Next, the cross-correlation normalization weighting is calculated with Equation 1 [11]: (1) and the normalization weight of cross-covariance is calculated with Equation 2 [13], [14] : The next process is GSTAR-OLS analysis to get the residual value with Equation 3   01 11 Next, calculate the var () εΩ matrix with the equation The next step is estimating the GSTAR (1, p) -SUR parameter using the formula ()  -1 -1 -1 β X'Ω X X'Ω y . The best model is chosen based on RMSE and R 2 prediction values. The research data analysis process was carried out using R and SAS software.

Cross-Covariance Weight of GSTAR-SUR Model for Rainfall Forecasting in Agricultural Areas
Agus Dwi Sulistyono 51

RESULTS AND DISCUSSION
This research was carried out by taking daily rainfall data obtained from the rain heading point for the Blimbing, Karangploso, Singosari, Dau, and Wagir regions. The following is a description of the statistics of rainfall data in the five locations presented in Table 1: Based on Table 1 above, it is descriptively shown that the average rainfall in Wagir District is the highest and Singosari District has the lowest average rainfall. In all study locations, the standard deviation value was greater than the average, indicating a high level of rainfall variation in all study locations. In addition, the heterogeneity of the observation location can be measured by calculating the Gini Index. The higher the index value, the more heterogeneous the location will be. This index calculation for the five locations in this study is: Based on the results of the Gini index calculation, the Gini index value is 0.975, close to 1. From the Gini Index calculation, it is shown that heterogeneous locations so that modeling using the GSTAR-SUR model can be done.
Stationary testing of variance was carried out using a Box-Cox plot. The stationarity of variance is said to be fulfilled if the Box-Cox plot results in a value of λ = 1. However, if the value of λ ≠ 1, then the data transformation process is carried out. The following are the results of stationary testing of the variance in rainfall data for each location: Based on Table 4.3 the initial λ values for all study locations have not been worth 1. This shows that the rainfall data in each location is not yet stationary in variety so that Box-Cox transformation is needed. The Box-Cox I transformation results show the value of λ = 1, which means that the data has been stationary to the variety and the transformation is stopped.
In addition to stationary variety, stationary testing is also carried out on the average. Stationary to average testing was carried out using the Augmented Dickey-Fuller (ADF) test. Stationarity on the average is said to be fulfilled if the results of the ADF test obtained the p-value of less than 0.05. If the ADF test results obtained the p-value of more than 0.05, it is necessary to do a differencing process. Following are the results of the ADF test: Based on the results of the stationary test on the average using the ADF test in Table  3, at each location p-value was less than 0.05. From this test, it is shown that the stationary data of rainfall on the average has been fulfilled.
The GSTAR model identification process is done by looking at the Matrix Partial Autocorrelation Function (MPACF) scheme.

Table 4. Matrix Partial Autocorrelation Scheme (MPACF)
Based on the MPACF matrix scheme in Table 4 it can be seen that there is a real MPACF lag in lag 1 to lag 3. Then in lag 4, there is no significant partial autocorrelation. Then in 5 lags and so on there are some significant partial autocorrelations. Based on the MPACF scheme, it is shown that significant partial autocorrelation is truncated at lag 4. So, the determination of the VAR order (p) is done by looking at the smallest AIC value for real lag. The following is the AIC value in lag 1 to lag 3: Based on the AIC value in Table 5 it can be seen that the lowest AIC value is obtained in the 3rd order. Thus, the GSTAR model used has a 3rd order. In addition to determining the order with the AIC value, identification of the GSTAR model is also carried out by univariate ACF and PACF plots at each location. Based on the ACF plot it is shown that rainfall data at each location is indicated by seasonal patterns. This can be seen in the ACF

Cross-Covariance Weight of GSTAR-SUR Model for Rainfall Forecasting in Agricultural Areas
Agus Dwi Sulistyono 53 plot which has a repetitive pattern at a certain time lag. Based on the PACF plot in Appendix 8 shows that at some time lag there is a PACF that crosses the 5% boundary line. When combined in 5 locations, it was found that the five locations had PACF that passed 12 and 36 of time lags. Therefore, the results of the identification of seasonal patterns indicated that the appropriate model was GSTAR ((1), 1,2,3,12,36) This study uses five locations with (1) or the number of locations adjacent to the ith location is 4 locations so that the cross-correlation normalization matrix is as follows: The following is a plot to predict rainfall data in each location:

Cross-Covariance Weight of GSTAR-SUR Model for Rainfall Forecasting in Agricultural Areas
Agus Dwi Sulistyono 56 in the two models are compared, the RMSE values of the two models are relatively the same. Besides being done by calculating the RMSE value, checking the accuracy of the model is also done by calculating the R2 prediction value at each location. As shown in Table 6, R 2 prediction values on GSTAR ((1), 1,2,3,12,36) -SUR models that use cross-covariance weights, are higher than R 2 prediction in models with cross-correlation weights, except in locations Blimbing and Singosari Districts.

CONCLUSIONS
The cross-covariance model produces a better level of accuracy in terms of lower RMSE values and higher R 2 values, especially for Karangploso, Dau, and Wagir areas.