Forecasting Population of Madiun Regency Using ARIMA Method

The high population growth of the Madiun Regency can cause population density that can have implications for other problems, both in terms of social, economic, welfare, security, land availability, availability of clean water, and food needs. This study aims to predict the population growth of Madiun Regency using the ARIMA method. The ARIMA method is popular for forecasting time series data, which is reliable because the calculation process is done gradually. The ARIMA method has three models, namely AR (Autoregressive), MA (Moving Average), and ARMA (Autoregressive Moving Average). This study uses annual population data of Madiun Regency from 1983 to 2021 and produces an ARIMA forecasting model (0,2,1) with a MAPE value of 8.42%. This study also showed that from 2022 to 2024 is predicted to increase by 17947 people or 2.39%. The results of this study are expected to be used as information from the Madiun Regency government in anticipating the emergence of problems caused by the population level of Madiun Regency in the future.


INTRODUCTION
One of the most important problems globally is the high population growth in developing countries [1], [2]. Indonesia is listed as one of the five most populous countries in the world. Indonesia ranks fourth after China, India, and the United States and is the most populous Asian continent [3]. According to the 2020 Population Census, conducted in 2020, the population of Indonesia reached 270.20 million. Indonesia has a land area of 1.9 million km 2 and a population density of 141 people per km 2 , with an average annual population growth rate of 1.25% between 2010 and 2020 [4]. In Indonesia, Java ranks first as the most populous island, and East Java is the second-most populous province after West Java, with 40.67 million people. [4]. East Java Province consists of several cities and regencies, one of which is Madiun Regency, whose population growth rate ranks fifth in the 2020 period and experiences population growth every year. In 2015 the population was 676,087 people, while its development in 2019 was 749,070 [5].
Population growth in Madiun Regency is affected by the high birth rate. In 2020, the birth rate in Madiun Regency will be 3928, with a population growth rate of 0.92% [5]. High population growth can cause various problems, such as regional spatial problems, housing, employment, education, economy, and security. In addition, it can also cause problems in social aspects, welfare, availability of clean water, food needs, and can cause environmental damage [1], [6], [7]. Population density, which can cause problems, needs to be anticipated by making predictions so that various handling strategies can be carried out. Several statistical mathematical-based forecasting models can predict it, including exponential smoothing, moving average, and ARIMA models (Box-Jenkins). Other forecasting models are based on artificial intelligence, such as neural networks, genetic algorithmic, simulated annealing, and classification [8].
Several previous studies by Xu et al., predicted Beijing's main area population using the Long Short Term Memory (LSTM) Model with MAPE 4.35% [9]. Next, a study that indicated the population of residents in East Kalimantan using Exponential Smoothing by Pakpahan, Basani, and Hariani that yielded a MAPE of 14.81% [10]. On the other, there are studies related to the prediction of population prediction using the ARIMA method conducted by Mardiyah et al. in Pasuruan City [11], Nyoni, Mutongi, and Munyaradzi in the Gambia [12], and Nyoni in Zimbabwe obtained MAPE < 3.94% [13]. Based on the previous studies, the authors are interested in the ARIMA method, which results in an excellent level of accuracy in some cases of population forecasting. The ARIMA method is flexible and straightforward in an application, and accurate prediction results for the short term, but the forecasting accuracy for long-term forecasting is not good and will usually tend to be flat for a long time [14], [15].
In applying practice and forecasting the population, ARIMA is also widely used in various case studies, including research by Alabdulrazzaq related to predicting the spread of COVID-19 with MAPE 4.2% [16]. Other research by Swaraj about COVID-19 predictions in India with MAPE 4.7% [17]. Additional research by Guha and Bandyopadhyay predicts the price of gold with a MAPE of 3.25% [18]. Another study by Banerjee related forecasting on the Indian stock market with MAPE 3.33% [19]. Then there was research by Grigonytė and Butkevičiūtė about predicting wind speed in Latvia with MAPE 1% [20]. Based on the explanation above, this study used the ARIMA method to predict the population of the Madiun Regency. This research is expected to provide information for the Madiun Regency Government to take policy steps to minimize and reduce risk due to the high rate of population growth of Madiun Regency.

The Data
The data used in this study is data on residents of Madiun Regency from 1983 to 2021, which is taken from the Central Bureau of Statistics of Madiun Regency, from the website https://bit.ly/PendudukKabMadiun [21].

ARIMA
Box and Jenkins first developed the ARIMA model in the 1970s [22]. The ARIMA is one of the econometric methods used to predict univariate time-series data. Box and Jenkins state that this model does not use independent variables but instead utilizes the information in the circuit to generate pre-predicted values. Therefore, the ARIMA model requires an autocorrelation process in the series. Autocorrelation is the correlation between two observations at different points in a time series. In other words, time series data is self-correlated. Time series models in the ARIMA method include autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) [23], [24].
The analysis with ARIMA Box Jenkins begins by creating a series of plot periods and plotting the ACF to determine whether the data is mean-stationary or variancestationary. Differentiation must be done if the data are not stationary to the mean. Otherwise, if the data is not stationary to the variance, a Box-Cox transformation is performed. Repeat the process for the data stationer. After getting stationary data, the next step is to predict the data from ARIMA based on the ACF and PACF plots. Then use Ljung-Box to test the parameters of the test model as well as test the residual hypothesis, which is residual white noise. It can be concluded, there are several stages of forecasting in ARIMA, namely model identification, parameter estimation, diagnostic testing, and prediction.

Model Identification
When identifying the model on the ARIMA method, the data used must meet the stationary or stability requirements. If the data does not meet the stationery requirements, the data must be stationary for the variance and average (mean) [25]. The transformation equation is as follows [26].
Where: ( ) : transformed data value : i th time data value : the estimated value of transformation parameters The transformed data is determined by the lambda value. For example, the following table shows some commonly used values and associated transformations.
For time-series data that have not satisfied the stationarity of the average, the data must be processed differentially to find the difference between one data and the previous data in sequence. The differencing equation is as follows. ′ is the differentiated data value, where is the i th time data value. If the data is already stationary, a tentative model of ARIMA (p, d, q) is obtained. Annotation p is a lag that exceeds the significance limit on the Partial Autocorrelation Function (PACF) plot graph, d is the level of differencing performed, q is the lag that crosses the significance limit of the Autocorrelation Function (ACF) plot.
Autoregressive is a model in which a dependent variable is influenced by the value of the dependent variable itself because the data used is single. In general, AR is p ordo, with the form ( ) as follows [27]. : stationary time series Autoregressive Integrated Moving Average data used must be stationary. ARIMA's general statement is as follows [30].

Parameter Estimation
Tentative model determination requires several estimation stages through model feasibility tests to find the best model. The significance test hypothesis is as follows [29].
Where: ̂ : estimation of autoregressive model parameters and moving averages (̂) : standard errors

Diagnostic Test
Diagnostic tests are used to determine whether or not the model is the best. A good model, where the residual results of the white noise assumption test using the Ljung-Box test are as follows [6], [29].
Where: ̂ : lag autocorrelation value k Q : Ljung-Box test : lag time

Prediction Accuracy Value
The results produced by the ARIMA model are measured in terms of forecast accuracy. Each method has a MAPE (Mean Absolute Percentage Error) error value that can be used to calculate the error value with the following formula [16], [31].
Where: : percentage of errors at t : t-time error value : actual data of t-time The quality of the prediction can be shown by the MAPE value, which can be interpreted into four categories, namely excellent (MAPE < 10%), good (MAPE 11% -20%), good enough (MAPE 21% -50%), and not good (MAPE > 50%).

RESULTS AND DISCUSSION
Based on Table 1, a time series plot is performed to determine the ARIMA model and identify the stationarity of the data.  Based on Figure 1, the plot data shows an uptrend (positive). The data is not stationary because in 2019 there was an increase seen from the previous year's difference of 67,676 people. If there is no increase or decrease invariance and average, the data is stationary.  Figure 2 shows that the lambda value is equal to 1, the data can be said to be stationary in variance. Stationary data on the average can be seen from the ACF plot and time series plot. The data does not yet have a fixed pattern.  From the plot Figure 3, it appears that the lag-lag is falling slowly. The plot time series data also does not have a fixed pattern, so the data is not stationary against the average. As a result, it is necessary to do a further transformation process through differencing so that the data is stationary.  Figure 4 shows that the data is stationary against the mean. If the data is stationary, the next step is to plot the autocorrelation function (PACF).   (0,2,0). The model is a random walk where the autocorrelation coefficient is equal to 1, so the tentative models of ARIMA are ARIMA models (1,2,0), (0,2,1), and (1,2,1).
A significance test and a residual white noise test were carried out to choose the model used in the prediction. Test the significance of the parameters by knowing the pvalue. If the p-value is less than 0.05, then the model is significant. The results of the ARIMA model's tentative significance test (1,2,0), (0,2,1), and (1,2,1) are as follows.  Table 3 shows that the ARIMA models (1,2,1) are not significant because p-values are more than 0.05, ARIMA models (1,2,0) and (0,2,1) are significant because p-values are less than 0.05. After conducting a parameter significance test, it is necessary to perform a residual white noise test to determine which model to use for prediction in performing residual tests using Ljung-Box. If the p-value is more than 0.05, the model meets the white noise requirement. Ljung-Box test results for ARIMA models (1,2,0), (0,2,1), and ARIMA models (1,2,1).   Table 4, after the Ljung-Box test, shows that the ARIMA model (1,2,0), (0,2,1), and ARIMA model (1,2,1) are white noise because the p-value is more than 0.05. After performing the white noise test, determine the MAPE value of the ARIMA provisional model. Based on Table 3, Table 4, and Table 5 of the ARIMA model, whose parameters are significant and meet the assumption of white noise, the ARIMA model (0,2,1) has the smallest MAPE value. After choosing the best model, then predicted the number of residents of Madiun Regency.  Figure 6 showed that there is not much difference between the actual data and the forecast and obtained MAPE < 10%, which means that the prediction model is already very good. The prediction of the number of residents of Madiun regency for the next three years (2022 to 2024) is presented in table 6 below:   Table 6 shows that the Madiun Regency population from 2022 to 2024 is predicted to increase by 17947 people or 2.39%. The results of these predictions show an uptrend every year. Population growth, if followed by an increase in the quality of human resources, will become a regional potential for development. On the other hand, if the increase in population is not accompanied by good quality human resources, it will become a burden for regional development [32].
Population data prediction is needed in the planning and evaluating of humanoriented development as the primary target because the population is both an object and a subject of the action. The object's function means the population as a target, and the people carry out the development mark. The function of the issue means that the people are the sole actor in action. The two functions are expected to go hand in hand and line integrally [32].

CONCLUSIONS
Based on the research results on the prediction of the population of Madiun Regency using the ARIMA method, it can be concluded that the best model for predicting the number of residents of Madiun Regency is ARIMA (0,2,1) with a MAPE of 8.42%. The predicted number of residents of Madiun Regency in 2022 amounted to 758561 people. The selection of the ARIMA method in this research for forecasting the number of residents in Madiun is very appropriate because it produces an error value of less than 10%. This method can be applied to similar case studies, especially the case of forecasting the number of residents in other areas.