The Generalized STAR Modeling with Heteroscedastic Effects

Most of the Generalized Space Time Autoregressive (GSTAR) models assume the constant error variance. In fact, there are many space-time observations whose variability is changing over the times. In this study, a GSTAR model is built with an error variance that is not constant or has a heteroscedasticity effect, namely the combination of GSTAR–Autoregressive Conditional Heteroscedasticity (ARCH). The parameters of the GSTAR–ARCH model are estimated using the Generalized Least Square (GLS) method to obtain the efficient parameter estimation. As a case study, the GSTAR–ARCH model is applied to the daily mean wind speed data of New Orleans, Florida and Mississippi, in order to predict the occurrence of Hurricane Katrina that occurred in 2005. It is obtained that the heteroscedastic involvement in GSTAR modeling gives the better results in predictions, compared to the homoscedastics approach. Furthermore, as the order of model is higher, the GSTAR model performances is better, which is shown by the least Mean Squared Errors (MSE) and Mean Absolute Percentage Error (MAPE). The obtained results show that the GSTAR model (3;0,0,1)–ARCH(1) predicts the Hurricane Katrina better than the GSTAR(3;0,0,1) and GSTAR(1;1)–ARCH(1) models.


INTRODUCTION
A hurricane is a natural phenomenon in the form of wind gusts with a speed exceeding 119 km/hour. Hurricanes are a type of tropical cyclone that usually forms on warm sea surfaces around the equator. The wind speed at one location is influenced by the wind speed of the previous time at that location and also influenced by the average wind speed at other locations. This means that the wind speed can be modeled with space-time models such as STARMA(p,q), STAR(p), STMA(q), GSTAR(p,q) and STARMAG(p,q). The model used in this study is the GSTAR model which uses a certain weight matrix according to the location conditions. The arrival of a storm which is usually predicted by weather satellites will be predicted using the GSTAR model. By using the daily average wind speed data one year before the storm, it is hoped that the arrival of the storm can be predicted earlier and reduce the number of victims, both property and life, due to the storm. However, a large increase in wind speed when a storm occur, causes the data to have heteroscedastic effect, so that the error generated by the GSTAR model has a non-constant variance. It makes the estimation of the initial model parameters being no longer efficient, thus a model that explains the variance of the error is not constant, namely the ARCH model is should be developed. The use of the ARCH is to model the variance of errors, such that it is expected to eliminate heteroscedastic elements and more efficient parameters can be produced.
The development of GSTAR model in Indonesia is very fast, both theoretically and in application. Theoretically, it includes the stationary properties of the process using the Inverse Autocovariance Matrix [1] as well as the kernel approach [2,3] and minimum spanning tree approach [4], GSTAR with correlated errors [5,6], GSTAR with outliers [7], GSTAR for discrete data [8], and invertibility of kernel GSTAR model [9]. The application of the GSTAR model has been carried out on economic data [10], tea plantations [11], palm oil production [12], red chili commodity prices [13], number of dengue fever cases [14], predictions of robbery cases in Medan, North Sumatra [15], the spread of Covid-19 cases in Java [16], and copper and gold grades vertical distribution [17]. On the other hand, the ARCH model, which accommodates the element of heteroscedasticity and exogenous variable which make the high volatility of process, is widely developed in economic problems. The impact of Covid-19 as the exogenous factor to the economic sector be explored by [18]. Sometimes the exogeneous factor cause a point of change happen and it should be detected [19]. However, its application has also been carried out to predict electric current [20], caterpillar pests in oil palm plantations [21], and rice prices [22].
The development of the GSTAR model with an error variance by considering the heteroscedasticity effect, has been investigated by [23] on the GSTAR(1,1) model with application to stock prices. The contruction of GSTAR model with the ARCH effect and estimate the parameters using the maximum likelihood method approach be explored by [24]. In this study, the GSTAR-ARCH model was developed by estimating the parameters using the Generalized Least Square (GLS) approach.

Generalized STAR Model
The GSTAR model is a generalization of the STAR model where the model parameters for each location that were initially considered homogeneous can be different. An observation at location i at time t are expressed as , . If the observations between locations are related, then these observations can be modeled using the GSTAR model. The general form of GSTAR is, with is a N-dimensional column vector ( 1, , 2, , … , , )′, ( ) is a N-dimensional weight matrix for spatial lag-ℓ ℎ , is N-dimensional matrix of autoregressive parameters for spatial lag ℓ and time lag j, and is a N-dimensional vector of errors respective to the observations inside vector . For homoscedastic GSTAR, the is a white noise vector whose mean and variance are constant, and follows normal multivariate distribution, meanwhile for heteroscedastics case, the variance is not constant.
The GSTAR modeling stage follows the Box-Jenkins iteration [1], consist of model identification, parameter estimation and diagnostic checking. Before doing GSTAR modeling, the data of process must have stationary properties. If it is not stationary, a differentiation process should be performed on the data until the data is stationary. In estimating the parameters of the GSTAR model, it can be done using the Ordinary Least Square (OLS) method by constructing the GSTAR model into a linear form = + , so that the OLS estimator obtained is ̂= ( ′ ) −1 ′ [1]

ARCH(1) Model
Consider a process {Yt} which follows AR(p) model such that can be written as: which is uncorrelated errors but has inconstant variance or depend on time. Based on Engle (1982), the error can be expressed as, with is random sample which independent and has identical standard normal distribution, and If the erros are known until (t-1) the the conditional variance of is stated as: From Eq. (2), it can be said that the conditional variance of depends on squares of the past errors and inconstant. Tthis condition is named as ARCH(p) model [25].
The simplest form of the ARCH(p) model and used in this study is the ARCH(1) model. In this model, the error variance at time t is affected by the square of the error of the previous one time lag. The ARCH(1) model is formulated as: = and 2 = 0 + 1 −1 2 with 0 and 1 are non-negative parameters of ARCH(1) model.

The Generalized STAR Modeling with Heteroscedastic Effects
Utriweni Mukhaiyar 161 The variance of ARCH(1) errors is, Then it is obtained, Since the variance is positive then, based on Eq. (3), 0 > 0 and 0 ≤ 1 < 1, be the stationary condition of ARCH(1) model.

GSTAR(1;1) -ARCH (1) Model
Consider = ( 1, , 2, , . . . , , )′ as a vector of observations in N location at time t, can be modeled as GSTAR(1;1)-ARCH(1), if it can be expressed as: where ~ ( , Ωt), is vector of errors which follows normal distribution with zero mean and inconstant variance over the time. The covariance matrix Ωt is defined as Ωt = diag(h1,t, h2,t, . . . , hN,t) and hi,t is a vector of erros variance on location i at time t, which can be modeled as ARCH (1), that is with is parameter of model for location i and k = 0, 1.
The assumption used in this model is that the errors between locations are uncorrelated with each other so that the error variance at location i at time t is affected by the square of the errors on that location at time (t -1) , but is not affected by the errors on the other locations. Meanwhile, the observation value of location i is influenced by the observations on that location and also the neighbor locations. The method used to estimate this model is the Generalized Least Square (GLS) method.
Let the transformed linear model, * = * + * be defined with * = , * = , and * = . Then, unbiased estimator of GSTAR-ARCH model parameters are presented as:  [26]. Furthermore, the stages of GSTAR-ARCH modeling is illustrated in a flow chart as presented in Fig. 1. Figure 1. Flowchart of the GSTAR-ARCH modeling stage. Modeling is carried out to obtain a homoscedastic error. The equation for the mean is modeled by the GSTAR model while the variance is modeled by the ARCH model.

RESULTS AND DISCUSSION
As a case study, the data used are the average daily wind speed in three states of the United States (N = 3) from September  thousand million dollars and more than 1,800 people died. The states of interest are New Orleans (Louisiana), Florida, and Mississippi, each of which can be seen in Fig. 2.
The modeling is carried out with the help of the R application. The data will be modeled with the space-time model and must meet the stationary properties first. The stationary data can be seen from the plot of the row of observations at each location as in Fig. 3(a). From the figure, it can be seen that there are several wind speed values that are higher than other observations. In addition, there is also a slight downward and rising pattern, which indicates a data pattern that is not stationary on average. Therefore, the data differentiation is done first. The series plot after one-time differentiation can be seen in Fig. 3(b). The stationary data is then centered so that it has a zero mean (centralized process). The process variability which occasionally increases, indicates that the variance is not constant. This will be accomodated in the modeling with heteroscedastic effect.
In GSTAR modeling, one of the important elements that characterizes the relationship between locations is the presence of a weight matrix. The weight matrix has entries , which represents the weight of location-j to location-i. This matrix has zeros entries in the main diagonal and the total weight in one row is equal to one. In this study, the weight matrix used is uniform and binary weights. The spatial lag used is limited to only one spatial lag. For simplicity, the uniform and binary weight matrix be used, respectively, are The Generalized STAR Modeling with Heteroscedastic Effects Utriweni Mukhaiyar 164 The first stage in the modeling is model identification with the help of Space-Time ACF and PACF plots, called STACF and STPACF. However, because the model has been determined at the beginning, namely the GSTAR(1;1) model, the model identification stage can be skipped. The STACF and STPACF plots obtained are used to see whether there is a relationship between time and location from the daily average wind rate data. The plots of STACF and STPACF can be seen in Fig. 4.
From Fig. 4 it can be seen that the data have time and spatial dependence, although the GSTAR(1;1) model is not very appropriate to model this data. From the STPACF plot, the best possible model for the data is GSTAR(3;0,0,1). Thus, the modeling be considered are GSTAR(1;1) and GSTAR(3;0,0,1) model. First, the obtained estimated parameters using Ordinary Least Squares (OLS) method for the GSTAR(1;1) model can be seen in Table 1. The next step is to test the presence of the ARCH effect on the error of each location. The existence of the ARCH effect can be detected from the plot of the squared error of each location which can be seen in Fig. 5.  To confirm the existence of the ARCH effect, the ARCH-LM test will be used (Engle, 1982). The presence of ARCH effect on the error is indicated by the p-value which is smaller than 1% ≤ α ≤ 10%. The ARCH-LM test results for the first six time lags in Table 2, show that the p-value is smaller in almost all locations and time lags. So it can be concluded that there is an ARCH effect on the GSTAR(1;1) model, means that the variance of errors is not constant over time. A slight difference found at Location 2, Florida. The p-values obtained are less than 9.2% until the third time lag, indicates that the wind speed value in this area tends to be more constant in average and variance than the other two observation locations. At a time lag of more than three, the wind speed in Florida did not show any ARCH effect on the process. However, the presence of heteroscedasticity effects in two other locations, also in Florida until the first three-time lags, be the reason to consider the inconstant variance in this case. Next, the inconstant erros variance of GSTAR(1;1) be modeled by ARCH(1). The obtained parameter estimation of model ARCH(1) model using Maximum Likelihood (ML) method, can be seen in Tabel 3. Furthermore, the variances of errors of each location, for t=1,2,..., T be the entries of dimenstional of diagonal matrix, Ω(t) = diag( 1, 2 , 2, 2 , 3, 2 ). This matrix is used for parameter estimation of GSTAR(1;1) -ARCH (1) model by using the GLS mehod. The estimated parameters are presented in Table 4. The estimated parameters of the model obtained using the GLS method in Table 4, are not much different from the estimated parameters obtained using the OLS method in Table 1. This is probably because the ARCH(1) model is not the right model to model the error variance of the GSTAR(1;1) model.
For comparison, the modeling with the same steps was carried out again using a binary weight matrix and a weight matrix based on wind direction. Determination of the best model is done by comparing the value of the Mean Squared Error (MSE) of each model. From Table 5 it can be concluded that the use of a uniform weight matrix in the GSTAR(1;1)-ARCH (1) modeling is slightly better than the binary weight matrix. The three locations involved in modeling give the composition of the uniform and binary weight matrix are quite similar. The comparison between the original and estimated data using the GSTAR(1;1)-ARCH(1) model can be seen in Fig. 6. From this figure, it can be seen that there is a big difference between the original data and estimated results. This is because the STPACF plot in Fig. 4 indicates that the GSTAR(1;1) model is not appropriate for the data.
From the STPACF plot, the possible next space-time model is the GSTAR (3;0,0,1) with a fixed error variance be modeled by the ARCH(1) model. This model explains that the condition of location i at t is influenced by its own condition at-(t -1), (t -2), (t -3) and the conditions of other locations which are its closest neighbors at (t -3). The modeling is carried out following the similar steps in modeling GSTAR(1;1) -ARCH (1) until the parameters are obtained before and after the ARCH element is calculated. The estimation parameter results can be seen in Table 6. The estimation parameters obtained from the GSTAR(3;0,0,1) modeling are not much different from the GSTAR estimation parameters GSTAR (3;0,0,1)-ARCH(1) so that the estimation results obtained will also not far different.    The comparison of the original and estimated data using the GSTAR(3;0,0,1)-ARCH(1) model can be seen in Fig. 7. By comparing the plots in Fig. 6 and Fig. 7, it can be seen that the estimation results generated by this model are better than the GSTAR(1;1)-ARCH(1) model. The GSTAR(3;0,0,1)-ARCH(1) model can capture the pattern of process variability. However, this model is not the best model for the data. From Fig. 7 it can be seen that the estimation results cannot reach the very high either low value of the original data. This may be due to the inaccurate selection of the ARCH(1) model as a model that explains the error variance for each location. The MSE value for the GSTAR(3;0,0,1)-ARCH(1) model is 0.875. This value is smaller than the MSE value of the GSTAR(1;1)-ARCH (1) model, so the model that will be used for short-term prediction is the GSTAR(3;0,0,1)-ARCH(1) model. To make sure the selection of the GSTAR(3;0,0,1)-ARCH(1) model, a comparison was made with the GSTAR(3;0,0,1) model without the effect of heteroscedasticity. Table 7 shows the comparison of MSE, MAD and MAPE values for the two models. Although those values were not significant different, the model used for short-term prediction is the GSTAR(3;0,0,1)-ARCH(1) model, since it is slightly better. Table 7. The comparison of GSTAR(3;0,1,1) and GSTAR(3;0,0,1)-ARCH(1) model From Table 8, it can be concluded that the GSTAR (3;0,0,1)-ARCH (1) model is not very good for estimating the changes in the daily average wind speed, which are too large. In a relatively short period of time, August 25 -29, 2005, there was an increase in the daily average wind speed. If this speed increase is assumed to be the beginning of Hurricane Katrina, then Hurricane Katrina is expected to hit New Orleans and Mississippi on September 1, 2005, which is three days later than the actual time of hurricane landfall in New Orleans and Mississippi. The Generalized STAR Modeling with Heteroscedastic Effects Utriweni Mukhaiyar 170 Table 8. The comparison of real data and its estimation using GSTAR(3;0,0,1)-ARCH(1) model

CONCLUSIONS
The daily average wind speed data in New Orleans, Florida, and Mississippi of the United States are not only influenced by the wind speed at the previous days in the same location, but the influenced of the wind speed in neighbor states can not be ignored. Through the previous modeling, it was found that the GSTAR(3;0,0,1)-ARCH(1) model is a better model than the GSTAR(3;0,0,1) model. It means that, the wind speed in threeprevious days of the closest neighbors will influence today's wind speed in the reference location. This model can capture the pattern of the wind speed volatilities compare to other observed models. However, the GSTAR(3;0,0,1)-ARCH(1) model is still not good enough to predict wind speeds that are extrme high nor low. This can be caused because the ARCH(1) model is not the right model to model the model error variance. Further analysis of the error variance model such as GARCH(p,q) is also needed so that the error variance can be modeled better. As for the application of the GSTAR model with the ARCH effect, it can also be developed in weather cases in Indonesia, as well as in other fields of science.