Seemingly Unrelated Regression Approach for GSTARIMA Model to Forecast Rain Fall Data in Malang Southern Region Districts

Time series forecasting models can be used to predict phenomena that occur in nature. Generalized Space Time Autoregressive (GSTAR) is one of time series model used to forecast the data consisting the elements of time and space. This model is limited to the stationary and non-seasonal data. Generalized Space Time Autoregressive Integrated Moving Average (GSTARIMA) is GSTAR development model that accommodates the non-stationary and seasonal data. Ordinary Least Squares (OLS) is method used to estimate parameter of GSTARIMA model. Estimation parameter of GSTARIMA model using OLS will not produce efficiently estimator if there is an error correlation between spaces. Ordinary Least Square (OLS) assumes the variance-covariance matrix has a constant error εij~NID(0, σ ) but in fact, the observatory spaces are correlated so that variance-covariance matrix of the error is not constant. Therefore, Seemingly Unrelated Regression (SUR) approach is used to accommodate the weakness of the OLS. SUR assumption is εij~NID(0, Σ) for estimating parameters GSTARIMA model. The method to estimate parameter of SUR is Generalized Least Square (GLS). Applications GSTARIMA-SUR models for rainfall data in the region Malang obtained GSTARIMA models ((1)(1,12,36),(0),(1))-SUR with determination coefficient generated with the average of 57.726%.


INTRODUCTION
Time series data is data based on the sequence of time within a certain time span and modeled by time series models either univariate or multivariate [1].If the data consists of the elements of time and space is modeled using multivariate models of space-time.One of the multivariate models are most often used for data modeling space-time is a Generalized Space-Time Autoregressive (GSTAR) model introduced by [2].GSTAR has limitations that can only be used for data space-time that are stationary and non-seasonal.This condition tends to not be met at the data that is not stationary and containing a seasonal pattern.
GSTARIMA first implemented by Sun, et al. (2010) for forecasting traffic flow data in Beijing.GSTARIMA is the development of GSTAR model for data non-stationary and seasonal.GSTARIMA more flexible and practical for every space observatory has its own real-time parameter and not influenced by other changes in the observation location.
A method that can be used in parameter estimation of GSTARIMA include Ordinary Least Square method (OLS) and approach to the system of equations Seemingly Unrelated Regression (SUR).OLS assumes the variance-covariance matrix has a constant error.Thus, the equations system of Seemingly Unrelated Regression (SUR) is used to overcome the weakness of the OLS.SUR equations system is used because it can accommodate correlations between space of the rainfall observatories.Parameter estimation of SUR can use the Generalized Least Square (GLS) method.Discussion and implementation of the model GSTARIMA still a bit to do so in this research will be modeling GSTARIMA approach SUR forecasting rainfall data in the region Malang with an observation location Jabung, Tumpang, Turen, Tumpuk Renteng, Tangkilsari, Wajak, Blambangan, Bululawang, Tajinan and Poncokusumo.

GSTAR (Generalized Space Time Autoregressive) Model
GSTAR with order autoregressive (p) and spatial order (  ), or represented by GSTAR (,   ) can be written in the equation: with: where, p and q is the order of the autoregressive and moving average,   and   is the spatial order k to autoregressive and moving average, and ∆  () is the observation vector () in order differencing to d.   and   are autoregressive and moving average parameters on the lag time to spatial lag k and l.  () is the weighted matrix of size N x N and () is the vector of the residual that is random and normal with size n x 1.

Seemingly Unrelated Regression (SUR)
Seemingly Unrelated Regression (SUR) is an equation parameter estimation using General Least Square (GLS).SUR Model with m equations stated, The assumption of the model is a residual   is independent at all times, but between equality / contemporary correlated location.[    |] = 0 when r≠s and [    |] = σ ij .Variance-covariance matrix is denoted by Ω.
SUR estimation by adding variance covariance matrix residual is stated as follows, assuming ~(, ).SUR estimators use the information system more efficient than OLS because of the diversity in each equation is smaller, [3].

The precision model with MSE and R 2
Criteria for the good of the model can be determined based on the residual is the Mean Square Error (MSE).
where e t = Z n+1 −  ̂() and N is account of data.The coefficient of determination stating how great the diversity of the dependent variable (Y) can be explained by the independent variable (X).According Makridakis, et al. (1999), R 2 is obtained from, )  =1 (7) where : () : dependen variable i  ̅ () : mean of  ()  ̂() : predicted value of  ()

METHODOLOGY
Location of the study is the rainfall observatory stations in the in malang southern region districts.The stations are Tumpang, Wajak, Tajinan, Jabung, Poncokusumo, Turen, Tumpuk Renteng, Tangkilsari, Blambangan, and Bululawang.Data used is 10 days period of rainfall (dasarian) since the beginning of the dry season is determined based on the amount of rainfall in a single dasarian (10 days) of less than 50 mm and is followed by several subsequent dasarian.Meanwhile, the beginning of the rainy season is determined based on the amount of rainfall in a single dasarian (10 days) is equal or more than 50 mm and is followed by several subsequent dasarian [4].
Rainfall data used to build GSTARIMA-SUR model is a sample data for forecasting (insample).The insample data is dasarian rainfall data for the period may 2000 to April 2015.The other data is called outsample data that is used to validate the GSTARIMA-SUR models of the data in the period May-June 2015.
Steps taken to form GSTARIMA-SUR model is started by exploration of rainfall data in the ten rainfall observatory stations, identification of univariate model (ACF and PACF) and multivariate model (MPACF and MCCF) to determine the order GSTARIMA, determination parameters of the model, model validation, and the last is forecasting rainfall data in the malang southern region districts and the surrounding region.

RESULTS AND DISCUSSION
Exploration data is used to easy for viewing the information on the data.exploration data can be presented in the form of graphs and descriptive data.Graph data movement precipitation 10 locations this year May 2000 to April 2015 is shown in the form of a time series plot as in Figure 4.1 below.Identification of the model is used to find the order of autoregressive and moving average for GSTARIMA model.The order of autoregressive lag is obtained from the identification of the real MPACF and the order of moving average lag is obtained from the identification of the real MACF, then from some real lag is chosen the best use of AIC.Lag which has the smallest AIC value will be used as the order of autoregressive and moving average for GSTARIMA model.Moreover to identification seasonal pattern use univariate identification by seeing ACF and PACF plot.Results of identification rainfall in malang southern region districts is GSTARIMA((1)(1,12,36),(0),( 1))-SUR.
Estimation of parameters is done by inserting a weighted into the equation to describe the spatial relationship between the location of the post of rain.The weight of the locations used in this study is the inverse distance weighting location.Based on the results of parameter estimation GSTARIMA ((1)(1,12,36),(0),( 1))-SUR to Tumpang rain post is as follows: Rainfall in the Tumpang Rain Post is influenced by the rain heading more in ten days and twenty days earlier and was influenced by weighting the location.In addition there is seasonality in the rainfall in the period a quarter of the year and annually.
Based on the above model prediction results can be obtained in the data sample for the Tumpang rain post as follows: The lowest R 2 prediction is in Tumpang Area rainfall, but the prediction obtained R 2 can still be said to be good for rainfall prediction for prediction obtained R 2 values> 50%.If viewed from the RMSE and R 2 prediction can be said that the rainfall distribution in ten observation locations have the same result by using modeling GSTARIMA ((1) (1,12,36) (0) (1))-SUR.
Result forecast rainfall using GSTARIMA ((1),(1,12,36)(0)(1))-SUR model is:  Model validation is done by comparing the actual data of rainfall dasarian the period from May to June 2015 on the results of data using models forecasting GSTARIMA ((1),(1,12,36)(0)(1))-SUR. If seen from Figure 4.3 can be said that result of forecasting data can approach the actual data.To better know the data equation then done two paired samples t test.Based on the results of two sample paired t test obtained by value t amounted to 1,958 with significant value 0095.t table with db = 59 values obtained 2,301, for t <t table (1.9585 <2,301) and the p-value is more than 0.05 (0095> 0.05) it was decided to accept H0.The conclusion of this validation test is that the rainfall dasarian data forecasting results do not differ significantly from the actual data, so the model GSTARIMA-SUR formed from in sample data can be used for forecasting rainfall dasarian next period.

CONCLUSION
Our analysis found that the GSTARIMA ((1)(1,12,36)-SUR model can be used to forecast the dasarian seasonal rainfall in the Malang Southern Region Districts.By using seasonal patterns in lag 1,12,36 more accurate forecasting result is obtained.To validate the model, MSE and R 2 prediction can be used.In this research, the largest R 2 prediction is 57.726%.

Figure 4 . 1 .
Figure 4.1.Time series Plot of Rainfall data Based on Figure 4.1, known that movement pattern of rainfall data in 10 locations tend to be similar.High and low altitude change of rainfall data recorded in each period indicates the same pattern every year.Identification of the model is used to find the order of autoregressive and moving average for GSTARIMA model.The order of autoregressive lag is obtained from the identification of the real MPACF and the order of moving average lag is obtained from the identification of the real MACF, then from some real lag is chosen the best use of AIC.Lag which has the smallest AIC value will be used as the order of autoregressive and moving average for GSTARIMA model.Moreover to identification seasonal pattern use univariate identification by seeing ACF and PACF plot.Results of identification rainfall in malang southern region districts is GSTARIMA((1)(1,12,36),(0),(1))-SUR.

Figure 4 . 3
Figure 4.3 Forecast for Period May-June 2015 in Malang Southern Region Districts

GSTARIMA (Generalized Space Time Integrated Autoregressive and Moving Average) Model
(): weighted matrix with size of  ×    : autoregresive parameter at time lag  and spatial lag   (): error vector with white noise and normal multivariate distribution

Table 4 . 1
Inverse Distance Weighted Value 10 Pos Rain GSTARIMA-SUR models for rainfall in the Malang southern region districts can be expressed as follows: Inspection accuracy of the model or the model validation is done by looking at the value of RMSE and R 2 of the model.

Table 4 . 3
Forecasting Rainfall Period of May-June 2015