Modeling Length of Hospital Stay for Patients With COVID-19 in West Sumatra Using Quantile Regression

This study aims to construct the model for the length of hospital stay for patients with COVID-19 using quantile regression and Bayesian quantile approaches. The quantile regression models the relationship at any point of the conditional distribution of the dependent variable on several independent variables. The Bayesian quantile regression combines the concept of quantile analysis into the Bayesian approach. In the Bayesian approach, the Asymmetric Laplace Distribution (ALD) distribution is used to form the likelihood function as the basis for formulating the posterior distribution. All 688 patients with COVID-19 treated in M. Djamil Hospital and Universitas Andalas Hospital in Padang City between March-July 2020 were used in this study. This study found that the Bayesian quantile regression method results in a smaller 95% confidence interval and higher value than the quantile regression method. It is concluded that the Bayesian quantile regression method tends to yield a better model than the quantile method. Based on the Bayesian quantile regression method, it was found that the length of hospital stay for patients with COVID-19 in West Sumatra was significantly influenced by Age, Diagnoses, and Discharge status.


INTRODUCTION
The problem of COVID-19 has become the concern of the world community from every group. In cases of being infected with COVID-19 in West Sumatra Province, not a few people have been declared cured, died, or are undergoing treatment at the hospital. People with criteria for severe symptoms of COVID-19 must undergo treatment in a hospital [1]. Certain factors influence the length of stay of COVID-19 patients. An estimation of the regression model parameters is carried out using quantile regression and Bayesian quantile regression methods to identify the factors that influence the length of stay of COVID-19 patients. The estimated length of stay for COVID-19 patients who are hospitalized can be used for specific purposes such as in health service activities. the need for health facilities at each level of health care. and the preparation of decisions related to mitigation scenarios and preparedness for COVID-19 [2]- [4].
If linear model assumptions are fulfilled, such as no multicollinearity, homoscedasticity, and no autocorrelation, the Ordinary Least Squares (OLS) method is used to estimate the model parameters [5]. In the preliminary analysis, data on the length of stay of COVID-19 patients in West Sumatra Province were not normally distributed. Therefore, the use of OLS was not efficient in estimating model parameters. For this reason, an analysis of the estimated parameters was carried out using quantile regression and Bayesian quantile regression. Quantile regression analysis was chosen because in estimating the parameters, it does not require any assumptions, including the assumption of normality, which only requires large data. The merging of quantile analysis into Bayesian concepts is carried out so that the resulting estimator becomes more effective and natural so that it can produce a better predictive model that is closer to the actual value [6], [7].
Research related to Bayesian quantile regression was initiated by Yu and Mooyed [8]. Research on this topic then developed rapidly, including research on numerical simulations in estimating the parameters of the Bayesian quantile regression method using the Gibbs sampling algorithm [9]. The application of the Bayesian quantile regression method is also applied in the use of binary response data based on the Asymmetric Laplace Distribution (ALD) distribution [10]. Subsequent research discussed the analysis of variable selection in quantile regression using the Gibbs sampling concept [7]. Further Bayesian quantile regression analysis was also used to estimate the model by approximating the likelihood function [11], as well as the analysis of posterior inference with the likelihood of the ALD distribution [12]. The application of Bayesian quantile regression was also used in modeling the Jeonse deposit in Korea [13]. Oh et al. do selecting variables using the Bayesian quantile regression method using the Savage-Dickey density ratio [14]. Furthermore, the application of Bayesian quantile regression was also applied in constructing a low birth weight model using the Gibbs Sampling algorithm approach [15].
This study aims to construct a model of length of stay for COVID-19 patients using quantile and Bayesian quantile regression methods to then compare the results between two methods. This case is important to be investigated since the cases of Covid-19 is increasing. As the results, rooms in hospitals become full. For this reason, this research needs to be carried out in an effort to find out what factors affect the length of stay of COVID-19 patients. This research will give information on how to shorten the length of stay of COVID-19 patients.

Material
Huskamp et al. and Kaufman et al. have found that mortalities are higher for the old populace than young populace [16], [17]. Yuki et al. recognized that older patients were more powerless to longer the length of hospital stay than younger patients [18]. This information implies that age could influence the length of hospital stay of a patient. Many studies also investigated that the presence of hypertension, diabetes, and coronary artery disease were considered as hazard factors to Covid-19 [19]. Gebhard et al, demonstrated that Covid-19 is deadlier for infected men than women [20].
The hypothesis model is constructed based on literatures to be then fitted to the data. The data used were 688 COVID-19 patients treated at M. Djamil Hospital, Padang City, and Andalas University Hospital in March-July 2020. In this study, the variables used are factors that are assumed to affect the length of stay of COVID-19 patients in West Sumatra In Figure 1 below, part (a) shows that the length of stay for COVID-19 patients has a histogram that is skewed to the left, while part (b) shows that some data are not located around a linear line. Based on both figures, these are informed that the data on the length of stay of COVID-19 patients is not normally distributed.

Modeling Length of Hospital Stay for Patients With COVID-19 in West Sumatra Using Quantile Regression
Ferra Yanuar 121 where 0 < < 1 with sample and predictor for = 1,2, … , written in the form: where ( ) is parameter's vector and is the leftover vector.
The -th conditional quantile function in the quantile regression method is defined as ( | ) = ′ ( ) then the estimated value of the parameter is ̂( ) obtained by minimizing [21]: where ( ) = ( − ( < 0)) is a loss function which is equivalent to : Minimization of Equation (2) was done by using the simplex method in linear programming. However, using the simplex method in estimating parameters is complicated to do. Therefore, an approach with the Bayes method is carried out so that the parameter estimation process becomes a little easier.

Bayesian Quantile Regression Method
Yu and Mooyed [8] found that minimizing the loss function of the quantile regression is equivalent to maximizing the likelihood function formed from the data assumed to be distributed in the Asymmetric Laplace Distribution (ALD). The ALD is used in the likelihood distribution to make Bayesian estimators more effective and natural. This estimation resulted in the ALD distribution is a possible parametric relationship between the minimization problem of Equation (2) and the maximum likelihood theory [7]. In addition, the quantile regression loss function is identical to the likelihood function of ALD [22].
The estimation of model parameters using the Bayesian quantile regression method can be done for any data distribution by assuming the following [8]: 1. ( ; ) has ALD distribution. 2. ( ) = ′ ( ). The observation was given by = ( 1 , 2 , ⋯ , ). Based on Equation (4), to combine the quantile regression method into the Bayesian method to estimate the parameter, . ALD was used to form the likelihood function. The ALD has a combined representation of several distributions based on the exponential distribution and normal distribution [9]. A random variable can be expressed in: The -th quantile regression model can be written as: ) , and the likelihood function is obtained as follows:  )).
These posterior distribution then are used to estimate mean posterior and variance posterior as point estimate for unknown parameter using Gibbs sampling iteration method [23], [24].
The goodness of fit for both methods is measured using 2 [25]. The formula for 2 is as follows: where is the residual absolute sum of weighted differences between the observed dependent variable and the estimated quantile of conditional distribution in the more complex model. While, is the total absolute sum of weighted differences between the observed dependent variable and the estimated quantile of conditional distribution in the simplest model. The range values for 2 are between zero and one. The value of 2 indicates the goodness of fit of the proposed model in explaining the variance of the response variable. The higher the value of 2 the better the proposed model obtained.

RESULTS AND DISCUSSION
Data analysis begins with fitting the data to the hypothesis model using the OLS method to select the significant variables involved for modeling in the quantile and Bayesian analysis. Based on OLS analysis, the variables of Age, Diagnosis, and Discharge status contributed significantly. Furthermore, a model of the length of stay for COVID-19 patients is constructed using the quantile regression method and the Bayesian quantile regression method. The analysis results are then compared between both methods by looking at the width of the 95% confidence interval and 2 of the selected quantile. The quantile used are 0.10; 0.25; 0.50; 0.75; dan 0.90. R software was used to analyze the data. The results of the analysis from both methods are provided in Table 3. In Table 3, it can be seen that for the quantile regression method, the 4  From the results of this estimation analysis, it is found that the Bayesian quantile regression method as a whole has more significant parameter and smaller 95% confidence interval than the quantile regression method.
In order to determine the best method including the best model, it could be based on the higher value of 2 . The 2 values for both methods for all selected quantiles are provided in Table 4. In Table 4 above, it can be seen that for the quantile regression method, the model at quantile 0.75 is the best model because it has the highest value of 2 , that is 0.93925. This value informs that the proposed model can explain the variance of length of hospital stay for patients with COVID-19 is 93.925%. This means that the proposed model at quantile 0.75 is acceptable and could be accepted. Meanwhile for the Bayesian quantile regression method, the quantile 0.75 is also as the best model because it has the highest value of 2 , that is 0.94244. This informs us that the model can explain the variance of the length of stay for COVID-19 patients by 94.244%. Since the 2 value obtained from Bayesian quantile regression model is higher than quantile method at corresponding quantiles, we could conclude here that Bayesian quantile method tends to result better model than quantile method. Therefore, the best model for the length of stay of COVID-19 patients in West Sumatra is model at quantile 0.75 based on Bayesian quantile regression method. This proposed model is formulated as follows: Furthermore, the convergence test of the proposed parameter model obtained was carried out. Because of limited space, the selected results of these test are provided in Figure 2 below.  In Figure 2 (a), it can be seen that the resulting trace-plot forms a pattern that converges to a value so that it can be stated that the model parameters have converged. While in part (b), it can be seen that the resulting density plot resembles a normal distribution curve. It can be stated that the model parameters are normally distributed. Then in part (c), the resulting ACF plot shows a smaller autocorrelation value so that it can be stated that there is no autocorrelation between samples. Based on these convergency test, it can be concluded that the model parameters have converged and proposed model could be accepted.

CONCLUSIONS
This study found that the length of stay of COVID-19 patients in West Sumatra was influenced by Age, Diagnoses of COVID-19 patients, and Discharge status. From the analysis carried out, the Bayesian quantile regression method is better in modeling the length of stay of COVID-19 patients than quantile method. The 95% confidence interval based on Bayesian quantile regression is smaller, and the 2 value is greater than the quantile regression method.