An Application of Geographically Weighted Regression for Assessing Water Polution in Pontianak, Indonesia

Geographically weighted regression (GWR) is an exploratory analytical tool that creates a set of location-specific parameter estimates. The estimates can be analyzed and represented on a map to provide information on spatial relationships between the dependent and the independent variables. A problem that is faced by the GWR users is how best to map these parameter estimates. This paper introduces a simple mapping technique that plots local t-values of the parameters on one map. This study employed GWR to evaluate chemical parameters of water in Pontianak City. The chemical oxygen demand (COD) was used as the dependent variable as an indicator of water pollution. Factors used for assessing water pollution were pH (X1), iron (X2), fluoride (X3), water hardness (X4), nitrate (X5), nitrite (X6), detergents (X7) and dissolved oxygen, DO, (X8). Samples were taken from 42 locations. Chemical properties were measured in the laboratory. The parameters of the GWR model from each site were estimated and transformed using Geographic Information Systems (GIS). The results of the analysis show that X1, X2, X3, X5 and X7 influence the amount of COD in water. The resulting map can assist the exploration and interpretation of data.


INTRODUCTION
In a residential area, human activities are one of the critical aspects that affect the quality of water resources. The more activities in the area, the higher the waste discharged into the environment. As the capital city of the West Kalimantan Province, the level of land use in Pontianak City has increased every year. This increase has resulted in a decrease in the carrying capacity of the city. One form of land use that has experienced a very rapid rise is the land for settlements. This condition is closely related to population growth. The total population of Pontianak City is estimated at 646,661 people, with a population density of 5,998 people/km2 and a population growth rate of 1.95% per year [1].
The quantity and quality of water in a region significantly affects the life of living things. Changes in the quality and quantity of water are strongly influenced by the patterns of land management that exist in the area. Waste generated from human activities in daily life can cause deterioration in water quality. Discharged waste has different characteristics that determine the degree of water quality around it. Waste produced from the activities of human life is diverse both in type and content. The waste can be in the form of organic compounds degraded by microorganisms as well as inorganic compounds such as soap, detergent, shampoo, and other cleaning agents that can contaminate water [2]. Parameters in measuring water quality include physical, biological and chemical parameters. These parameters are essential variables for measuring the water quality [3]. However, this paper focused on chemical variables. Water pollution can also be seen from the amount of oxygen content dissolved in water, namely through the measurement of chemical oxygen demand [4]. Chemical oxygen demand (COD) is the total amount of oxygen needed to oxidize organic matter chemically. Household waste is the primary source of organic waste and is one of the leading causes of high COD concentrations. This condition has an impact on humans and the environment, one of which is that many aquatic biotas die because the level of oxygen dissolved in water is small. The criteria for proper water use are increasingly difficult to obtain.
This research aimed to investigate the relationships among many variables and the COD in the research area. The samples were collected from several locations having different characteristics and types of land and environment. This sampling procedure could create a dependence between data measurements and their locations. Hence, it generates spatial data. Techniques of spatial analysis can then be applied to the collected data. This research utilized the Geographically Weighted Regression (GWR) to investigate the relationship between the dependent variable and the corresponding independent variables. As an exploratory method, GWR provides extra information for any spatial data set and should be useful across all disciplines in which spatial data are utilized. Applications of GWR include studies in a wide variety of demographic fields including but not limited to the analysis of health and disease (see for example [5,6,7,8]), environmental equity [9], housing markets [10,11], population density and housing [12], poverty mapping in Malawi [13], urban poverty [14], demography and religion [2], as well as environmental conditions [15]. Brown et al. [16] used GWR to investigate the relationships between land cover, rainfall, and surface water habitat in predominately agriculture regions in Southeast Australia. It was found that GWR provided a better estimate than the OLS method. In this study, GWR was applied to investigate the relationship among variables of chemical contained in the water samples.

METHODS
The research was carried out in Pontianak City (Lat. 0°02' N -0°01' S, Long. 109°16' -109°23 E). Pontianak is the Capital City of the West Kalimantan Province, Indonesia. It covers approximately 107.82 km 2 . Soil conditions in the City of Pontianak consist of soil types of Organosol, Gley, Humus, and Alluvial, each of which has different characteristics.
The sampling method was carried out by stratified random sampling. Subsequent sub-populations called strata were formed based on the criteria of the area flowed by the same tributary. It is assumed that the level of water pollution is homogeneous. The sample units studied were rivers/ditches, with a total sample of 42 water samples from different locations, representing the six districts in the City of Pontianak. The sites were plotted into a map of Pontianak City in Figure 1 [17]. The samples were taken in the same

An Application of Geographically Weighted Regression for Assessing Water Polution in Pontianak, Indonesia
Dadan Kusnandar 188 conditions, namely when the water receded. In this study, the response variable is COD, while the independent variables used include pH ( 1 ), iron ( 2 ), fluoride ( 3 ), hardness ( 4 ), nitrate ( 5 ), nitrite ( 6 ), detergent ( 7 ), and dissolved oxygen, DO, ( 8 ). GWR is as follows [18]: are k + 1 continuous functions of the location (u, v) in the geographical study area, and ∼ (0, 2 ). The log-likelihood for any particular set of estimates of the functions may be written as follows (see, for example [18]): Rather than attempting to maximize equation (2) globally, we consider the local likelihood. We consider the problem of estimating is an empirical estimate of the expected log-likelihood at the point of estimation GWR employs a weighted distance decay function for model calibration. The GWR assumes that observations closer together will have more impact on each other than on observations further apart. The weighting function for including related samples can be calculated using the exponential distance decay function: where ij  is the weight of observation j for observation i, dij is the distance between observation i and j and b are the kernel bandwidth When the distance between observations is greater than the kernel bandwidth, the weight rapidly approaches zero. Fixed bandwidth kernel calculates a bandwidth that is held constant over space, whereas the adaptive bandwidth kernel can adapt bandwidth distance in relation to variabledensity; bandwidths are smaller where data are dense and more abundant when data are sparse.
In this study, all GWR models used the adaptive kernel bandwidth as sample densities varied spatially. The optimal bandwidth distance was determined automatically in GWR using the Akaike information criterion (AIC).

RESULTS AND DISCUSSION
Tests on spatial aspects consist of two stages, namely the analysis of spatial dependency and spatial heterogeneity test. The spatial dependency test performed using the Moran's I test, whereas the spatial heterogeneity test used the Breusch-Pagan test. The test results are presented in Table 1. Moran's I test indicated the existence of dependency spatial in the model. Whereas the results of Breush-Pagan showed there are no differences in characteristics among

An Application of Geographically Weighted Regression for Assessing Water Polution in Pontianak, Indonesia
Dadan Kusnandar 190 observation points. The next step is the selection of bandwidth that will be used in GWR modeling. Bandwidth selection can be made by examining the Cross-Validation (CV) value between weighting functions. The weighting function used were Fixed Gaussian, Fixed Bisquare, and Fixed Tricube (Table 2).  Table 2 shows that the Fixed Gaussian model gives a minimum value of the CV with the bandwidth of 0.020963. The Fixed Gaussian model was then used to obtain the weighting matrix for each location.
Models obtained by the GWR was compared to that of the Ordinary Least Squares (OLS) Method in term of their SSE. The results showed that GWR performed better than that of the OLS (Table 3). Models for all 42 locations of the COD on the eight dependent variables are presented in Table 4. The coefficient of determination (R 2 ) also showed in the table. The coefficient of determination of the model varied between 75% (Location 3) to 95% (Location 4 and 24). Partial significance tests were carried out to examine which parameters are significant. The t-statistic was used for the tests. The values of the tstatistics for each parameter are presented in Figure 2 (a) to (h).   The sample locations marked with • indicate that the corresponding coefficient of i is not significant (the |t| value is smaller than 1.64). The symbol  is used to indicate that the coefficient of i is almost significant (1.64 ≤ |t| < 1.96). Whereas the symbols + and  are used to indicate that the coefficient of i are significant (|t| ≥ 1.96). The values of t for 1 are generally small for the samples located close to the rivers (Fig. 2 (a)). The variable pH (X1) for those locations is not significant, hence it does not contribute to modeling the COD. However, variable pH appeared to be significant for the sample located at some distance from the river (there were 18 sample locations). In general, the t values for 2, 3, 5, and 7 are significant ( Fig. 2 (b), (c), (e), (g)). These results indicate that the contents of the variable of iron, fluoride, nitrate, and detergent have a great influence in  Kusnandar 193 modeling the COD. High nitrogen compounds in the form of oxidized nitrogen, such as nitrate, tend to reduce the level of dissolved oxygen in water through the oxidation of ammonia [13,10]. Likewise, detergent, where one of the integral ingredients is phosphate compound, has a major role in the occurrence of eutrophication in the water body [22]. The t value for 4, 6, and 8 ( Fig.2 (d), (f), and (h)) are generally small, indicating that water hardness ( 4 ), nitrite ( 6 ), and DO ( 8 ) are not significant in the whole samples. The three variables are, therefore, not important in predicting the COD of the samples.
Plotting the t value of the regression coefficients of the sample location in a map enables the researcher to identify the importance of the variables to the regression models in each sample location. The application of GWR allows getting a different model in each location.

CONCLUSIONS
This study has demonstrated the superiority of GWR to model the spatially varying relationship between variables over OLS regression. GWR 42 samples point can result in 42 regression models that accommodate the characteristics of the location. The variable pH (X1) generally small for the samples located close to the rivers, however it is significant for the samples located at some distances from the river. The variable of iron, fluoride, nitrate and detergent significant in regression modeling of COD. These results could provide the researcher with a more accurate prediction of chemical oxygen demand in each location.