A Combination of Generalized Linear Mixed Model and LASSO Methods for Estimating Number of Patients Covid 19 in the Intensive Care Units

Generalized linear mixed models (GLMM) combined with the L1 penalty (Least Absolute Shrinkage and Selection Operator/LASSO) is called LASSO GLMM. LASSO GLMM reduces overfitting and selects predictor variables in modeling. The aim of this study is to evaluate the performance model for predicting Covid-19 patients with certain congenital disease that require ICU based on the results of blood tests laboratory and patient’s vital signs. This study used binary response variables, 1 if the patient was admitted to the ICU and 0 if the patient was not admitted to the ICU. The fixed effect predictor variables are the results of blood tests laboratory and patient’s vital signs. The random effect predictor variable is patient's congenital disease. The result showed that the average of accuracy and AUC from LASSO GLMM is more than the average of accuracy and AUC from LASSO GLM by using 5% level of significance. Respiratory rate and Lactate show a significance effect to predict the ICU needs of Covid-19 patients. The random effects patient's congenital disease has significance effect at 5% level of significance. It means that the ICU needs for Covid-19 patients varies among patient's congenital disease. We can conclude that GLMM LASSO with the random effect of patient’s congenital diseases has better modeling performance to predict the ICU needs of Covid-19 patients based on the results of blood tests laboratory and patient’s vital signs. The results of this modeling can quickly detect Covid-19 patients who need the ICU and can help medical staff use ICU resources optimally.


INTRODUCTION
Generalized linear model (GLM) is an approach that can be used to model the effect of predictor variables on response variables derived from exponential family distribution. For observations in certain groups there is usually a correlation between observations then the GLM study is expanded to include random effects on linear predictors. When the GLM model added a random effect, the model called Generalized Linear Mixed Models (GLMM) [1]. GLMM modeling has a problem with the number of predictor variables used in relation to complexity in modeling. The more predictor variables used in modeling, the estimation is very unstable [2]. The existence of predictor variables that are not related to the response variables in the model will cause overfitting problems. To improve the accuracy of the model prediction, a penalty is added in modeling [3].
The addition of penalty function in modeling was carried out by Tibshirani (1996) using the L1 penalty, namely ∑ | | =1 which is called Least Absolute Shrinkage and Selection Operator (LASSO). Lambda (λ) in the L1 penalty function is a shrinkage parameter (λ) that determines the amount of shrinkage regression coefficient. LASSO reduces overfitting and selects predictor variables in modeling [4]. Modeling with a combination of GLM and GLMM with LASSO techniques in this study are called LASSO GLM and LASSO GLMM. Researchers have discussed various problems on LASSO GLM, such as Arnold and Tibshirani (2016) [5], Hossain et al. (2015) [6], Zhang and Zou (2014) [7], Simon et al. (2013) [8], Friedman et al. (2010) [9]. The LASSO GLM optimizes the objective function by using coordinate descent optimization. This algorithm is available in the R programming language, namely glmnet package [9].
Some researchers have discussed variable selection procedures in GLMM using the L1 penalty, including Thomson and Hossain (2018) [10], Groll and Tutz (2014) [2], Schelldorfer et al. (2011) [11], Ibrahim et al. (2010) [12]. The LASSO GLMM produces stable estimations because penalty L1 can select the important predictor variables used in GLMM [2]. The GLMMs using the L1 penalty are useful whenever there is a grouping structure among high dimensional observations [11]. Previous studies also have found an algorithm for estimating the maximum likelihood in the GLMM model with the addition of the L1 penalty function. The penalized loglikelihood function maximize using gradient ascent algorithm, this algorithm is called GLMMLasso [13]. The GLMMLasso algorithm in the R programming language is included in the glmmLasso package [14].
In this study, researchers apply LASSO GLM and LASSO GLMM to predict the ICU needs for Covid-19 patients. The surge in Covid-19 cases is putting enormous pressure on the health care system. Intensive Care Units (ICU) is one of the health facilities needed by patients with Covid-19 confirmation. The study examines the prediction of ICU for Covid-19 patients. The ICU needs for Covid-19 patients were analyzed using the results of blood tests laboratory, vital signs and the patient's congenital disease. The predictor variables for blood test laboratory results and patient's vital signs were fixed effect, whereas predictor variables for patient's congenital disease were assumed to be fixed effect for LASSO GLM and random effect for LASSO GLMM. Previous researchers have discussed the performance of LASSO GLM and LASSO GLMM modeling on rainfall data, the results showed that modeling with LASSO GLMM has better performance than LASSO GLM [15]. To predict the ICU needs for Covid-19 patients based on laboratory results of blood tests, patient's vital signs and congenital disease, researchers conducted modeling with LASSO GLM and LASSO GLMM. The aim of this study is to evaluate the model's performance in predicting Covid-19 patients with certain congenital disease groups that require ICU based on the results of blood tests laboratory and patient's vital signs.

Data
The study used data from patients confirmed by Covid-19 at the Sírio-Libanês Hospital, São Paulo, Brasilia. Data were collected after 12 hours of confirmed Covid-19 patients undergoing treatment in the hospital. Total data were 98 patients, with 52 ICU patients and 46 non-ICU patients.
The study used binary response variables, 1 if the patient was admitted to the ICU and 0 if the patient was not admitted to the ICU. The fixed effect predictor variables for modeling totally used 32 variables, 26 variables from the results of blood tests laboratory and 6 variables patient's vital signs. The fixed effect predictor variables used in modeling can be seen in Table 1. Researchers assumed patient's congenital disease as fixed effect predictor variables in modeling using LASSO GLM and a random effect predictor variable in modeling using LASSO GLMM.

Research methods
Modeling was carried out to predict the ICU needs for Covid-19 patients based on the results of blood tests laboratory, vital signs and congenital diseases. There are many predictor variables used in modeling. We select the variables to determine the important predictor variables, then a simpler model is obtained by adding the L1 penalty function to the model. The algorithms of this research were as follows: then 0 is rejected. d. Determine the model accuracy.
To evaluate the performance of LASSO GLM and LASSO GLMM, researchers have chosen the best model to predict the hospitalization needs of a patient with Covid-19. The best model was selected based on accuracy and AUC. The steps for selecting the best model were as follows: a. Partition data with a composition of 80% modeling data and 20% validation data. Data partitioning was performed 30 times b. Modeling the LASSO GLM and LASSO GLMM used modeling data for each replication c. Assessing model performance based on AUC and accuracy values using validation data for each replication d. Statistically perform a performance difference of LASSO GLM and LASSO GLMM used paired sample t-test.

LASSO GLM Modeling
LASSO GLM selects variables based on λ. The λ optimum is obtained when the binomial deviance value is minimum. Cross validation plot to optimize LASSO GLM shrinkage parameters is shown in Figure 1.  Figure 1, the optimum λ was 0.024. The predictor variables included in the modeling are fixed effect predictor variables. There are 26 features laboratory blood test results and 6 patient's vital signs, and a patient's congenital disease as dummy variable.

A Combination of Generalized Linear Mixed Model and LASSO Methods for Estimating Number of Patients Covid 19 in the Intensive Care Units
Alona Dwinata 17 LASSO GLM modeling used the R package glmnet. The plot of the LASSO GLM coefficient for each log λ value can be seen in Figure 2. The regression coefficient with non-zero values results from the LASSO GLM modeling is shown in Table 2.

LASSO GLMM Modeling
The same as LASSO GLM, LASSO GLMM also required optimum λ in modeling. Figure  3 shows the binomial deviance value for each value of λ. The optimum λ is 19.6 that obtained when the smallest deviance. package glmmLasso. The plot of the LASSO GLMM coefficient spread for each λ can be seen in Figure 4. The regression coefficients go to zero along to the increasing λ. The regression coefficient of the LASSO GLMM modeling with λ = 19.6 result 4 non-zero predictor variables that is shown in Table 3.  The patient's congenital disease as random effect had standard deviation 0.8262 with 2 = 4.12 dan ( =1, =0.05) 2 =3.84. Then, 0 is rejected. It means that the random effects for patient's congenital disease was significant at 5% level of significance.

Selection of the best model
Data were divided randomly with a composition of 80% modeling data and 20% validation data. Furthermore, there are 79 patients as modeling data and 19 patients as validation data. Data partitioning was carried out in 30 replications. The optimum λ is obtained based on the modeling data taken for each replication.  Furthermore, the LASSO GLM and LASSO GLMM modeling was carried out for each replication. Assessment of modeling performance use the accuracy and AUC in the validation data. Comparison of the accuracy and AUC of 30 replications for each model is shown in Figure 5 and Figure 6. The performance differences of LASSO GLM and LASSO GLMM can be statistically stated by paired sample t-test of AUC and accuracy. The results of the paired sample t-test for these two models can be seen in Table 4. The hypothesis about accuracy or AUC of the two models is as follows:  H0: Average accuracy of LASSO GLM is less than or equal to average accuracy of LASSO GLMM  H0: Average AUC of LASSO GLM is less than or equal to average AUC of LASSO GLMM The t-test results in Table 4 showed the p-value for accuracy and AUC less than 0.05. It means the average of accuracy and AUC from LASSO GLMM is more than the average of accuracy and AUC from LASSO GLM by using 5% level of significance.

Discussion
The ability to identify patients who need the ICU is needed. The solution to this problem can be done by identifying the most important variables that affect the ICU needs for Covid-19 patients. The paired sample t-test of accuracy and AUC in Table 4 showed that modeling with LASSO GLMM has better performance than LASSO GLM. Figure 4 shows the effect of the predictor variables for each lambda value. By using lambda 19.6, this model produced four non-zero fixed effect predictor variables which are the focus of attention to predict the ICU needs of Covid-19 patients, namely Lactate, Blood pressure systolic, Respiratory rate and Oxygen saturation. Among these four predictors, only respiratory rate had a significant effect at the 5% level of significance and Lactate had a significant effect at the 10% level of significance. Meanwhile, Blood Pressure Systolic and Oxygen Saturation had no significant effect.
The odds ratio of respiratory rate was 165.67. It meant that the odds of Covid-19 patient required the ICU was 165.67 higher given an increase of a unit respiratory rate (respirations per minute/rpm) than before the increase. Covid-19 damages the respiratory system. Respiratory rate is one measure used to identify respiratory tract infections immediately before and during the first days of symptoms. The normal respiratory rate for adults at rest is 12 to 20 rpm [16]. The findings of a study suggest that the stability of nightly respiratory rate measurements in healthy individuals at night rest is a useful metric for tracking changes in health [16]. The odds ratio of Lactate was 0.52. It meant that the odds of a Covid-19 patient required the ICU was 0.52 lower given an increase of a unit Lactate (mmol/L) than before the increase. Arterial lactatemia higher than central vein (a reversed Delta a-cv lactate) indicates a disturbance in the mitochondrial metabolism of lung cells caused by severe inflammation [17]. An increase in one unit of venous blood lactate reduces reversed delta a-cv lactate.
LASSO GLMM produced an AUC of 0.96. This means that GLMM LASSO has good predictive performance in predicting the ICU needs of Covid-19 patients. The random effects patient's congenital disease was significant at 5% level of significance. It means that the ICU needs for Covid-19 patients varies among patient's congenital disease. We can conclude that GLMM LASSO with the random effect of patient's congenital diseases has better modeling performance to predict the ICU needs of Covid-19 patients.

CONCLUSIONS
In this study, modeling with LASSO GLMM has better performance to predict the ICU needs of Covid-19 patients than LASSO GLM. LASSO GLMM has good predictive performance in predicting the ICU needs of Covid-19 patients with an AUC 0.96. Respiratory rate has a significant effect at 5% level of significance and Lactate has a significant effect at 10% level of significance in LASSO GLMM. Respiratory rate shows the largest significance effect to predict the ICU needs of Covid-19 patients. Random effects of patient congenital disease had a significant effect on covid-19 patients requiring ICU at 5% level of significance. It means that the ICU needs for Covid-19 patients varies among patient's congenital disease. We can conclude that GLMM LASSO with the random effect of patient's congenital diseases has better modeling performance to predict the ICU needs of Covid-19 patients based on the results of blood tests laboratory and patient's vital signs.