Comparisons between Resampling Techniques in Linear Regression: A Simulation Study

Parameter estimations in linear regression need to fulfill some assumptions. Once the assumptions are not fulfilled, the conclusion is questionable. Bootstraps and Jackknife are resampling techniques that do not require assumptions in estimating the 𝛽̂ . The study aims to compare resampling techniques in linear regression. The data used in the study is clean, without any influential observations, outliers, or leverage points. The ordinary least square method was used as the primary method to estimate the parameters and then compared with resampling techniques. The variance, p-value, bias, and standard error are used as a scale to estimate the best method among random bootstrap, residual bootstrap and delete-one Jackknife. After all the analysis, it was found that random bootstrap did not perform well while residual and delete-one Jackknife works quite well. Random bootstrap, residual bootstrap, and Jackknife estimate better than ordinary least square. The study also found that residual bootstrap works well in estimating the parameter in the small sample. At the same time, it is suggested to use Jackknife when the sample size is big because Jackknife is more accessible to apply than residual bootstrap and Jackknife works well when the sample size is large.


INTRODUCTION
Regression analysis is a statistical analysis that constructs relationships between dependent or response variables and independent or regressor variables ( 1 , 2 , … , ). Ordinary least square (OLS) is a traditional way of finding parameter estimates, ̂ but it relies strongly on assumptions [1]. The reliability and validity of the conclusion in regression analysis are essential ( [2], [3]), and they depend on how far the data follows the assumption and on the sample size of the data. It is easier to find the estimated regression coefficient, ̂ without any assumption or distribution. Bootstrap and Jackknife are resampling techniques that do not need any assumptions in estimating the ̂ ,([4]- [6]. Sahinler and Topuz [7] compared the bootstrap and Jackknife methods. Their research discussed strategies for building a regression model using the Jackknife and bootstrap method. The four methods used in their research are bootstrap based on the resampling observations, bootstrap based on the resampling errors, delete-one Jackknife regression and delete-d Jackknife regression. These methods were used to find the parameter estimates, bias, standard errors, and confidence intervals. Their research concluded that large bootstrap replicates ensure that the parameter is close to the true parameter. They also suggested that bootstrap replicate is sufficient for estimating the variance and = 1000 for estimating the standard errors. Their research tests the accuracy of bootstrap and Jackknife methods in estimating the distribution of regression parameters with various sample sizes and various bootstrap replicates. Sahinler and Topuz [7] and Li et. al. [8] found that the bootstrap method is appropriate for linear regression and it is usable even when the error is not normally distributed. Algamal and Rasheed [9] further develop resampling in linear regression. The advantage of bootstrap approximations is that, in general, it needs a smaller sample than the ordinary least square for estimating the parameter. Meanwhile, the disadvantages of bootstrap methods were discussed in Ma et al., [10], Wan et al., [11], [12], and Phaladiganon et al., [13] A few of the disadvantages of the methods are as follows: a) Bootstrap distribution of is not a good approximation of , if the sample size is small and with the existence of an outlier, b) Bootstrap is not suggested to use in dependence structure case like time series, and c) It is not preferable to use residual bootstrap when the assumptions are violated.
Based on that, the study is aimed to compare parameter estimates of multiple linear regression based on several resampling methods. There are several methods to estimate the ̂ in bootstrap and Jackknife. The scope of this research is to investigate the bootstrap and Jackknife method with different scenariosThis research considered random bootstrap, residual bootstrap, and Jackknife delete-one observation. The study is limited to multiple linear regression model. First the sample size will be selected with different size and estimate the parameter. The bias and variance will be observed then the relationship between the bias and variance will be investigated. The distribution also will be observed by varying with the increase in the sample size. The value of bootstrap resampling with different bootstrap replicates and sample size gives less bias than ordinary least square. The Jackknife coefficient is calculated by using, where n is the sample size and ̂ parameter estimate for each sample formed after deleting one of the observations. While the bootstrap coefficient is calculated from where = 1,2, … , is bootstrap replicate, is error of the regression, is the independent variable and ̂ is the parameter estimate from ordinary least square method.

Comparisons between Resampling Techniques in Linear Regression: A Simulation Study
Anwar Fitrianto 347

Data
The data used in this study is pressure-dropping data, which is available in Montgomery et al., [16]. It has one dependent variable , and four independent variables, that is 1 , 2 , 3 and 4 . There are 62 observations in the data. The data was collected from research where the pressure drop was measured for two-phase flow through screen-plate bubble columns. The research was conducted to test the reason of the pressure drop through the bubble cap. A bubble column is used to observe the reaction between the gas and liquid.
The first factor considered in that research is the superficial fluid velocity of the gas. The gas's speed and direction of motion are measured by flow in the column. The second factor is the kinematic viscosity. The friction caused by the thickness of gas when the gas moves through the liquid particles was calculated. Then the distance across the space between two parallel threads was considered. The last factor used in research is the dimensionless number, which is not associated with the physical dimension. It is calculated to relate the gas's superficial fluid velocity and the liquid's superficial fluid velocity. For building the model, the dependent variable denotes the dimensionless factor for the pressure drop through a bubble cap. The independent variables are 1 (superficial fluid velocity of the gas ( ⁄ ), 2 (kinematic viscosity), 3 (mesh opening, cm), and 4 (dimensionless number relating the gas's superficial fluid velocity to the liquid's superficial fluid velocity).

Simulation Study Scenarios
The original data will be analyzed using ordinary least square regression data. Then assumptions checkings will be conducted using the residuals of the model. Then, using the sampe original data, resampling techniques using the residuals and random bootstrap resampling will be conducted with four different sample sizes, which are 20, 40,50 and 62. Each sample will be used in three different bootstrap replicates, namely 100, 1000 and 10000.
For the delete-one Jackknife bootstrap, the resampling will be conducted at different sample sizes, namely 20, 40, 50 and 62. The bias, variance, standard error and p-value will be calculated for each method. The best method among this three methods will be chosen according to the value of bias, variance, standard error and p-value.

RESULTS AND DISCUSSION
In this study, full model was used for the reference, which means all independent variables were included in the model regardless the significance of the variables. The fitted full regression model which was obtained based on ordinary least square using SAS software is written as follows: ̂= 5.88839 − 0.48460 1 + 0.18263 2 + 35.39109 3 + 5.92695 4

Random Bootstrap Approach
Random bootstrap technique was first used to analyze the data. The resampling was conducted at different sample size 20, 40, 50 and 62. The bootstrap replication were applied in every sample size, namely 100, 1000 and 1000.  Table 1 shows the changes in ̂3 and ̂0 at different sample sizes and bootstrap replicates. For each parameter estimate, as the sample size changes, the bias changes. More specifically, the bias is getting smaller as the sample size increases. The variance of ̂3 decreases from 574.9345 when the sample is 20 to 61.2876 when the sample size is 50. But, the bias of ̂3 increases when the sample is 62 . It can be observed that as the sample size increased from 20 to 62, the variance of parameter estimates decreased. Meanwhile, the bias decreases as the bootstrap replicate increases. For B was set to 100, the intercept shows bias as 1.5437. This value decreases to 1.4503 when the number of bootstrap replicates, B, increases to 1000. When the number of bootstrap replicates was increased to 10000, the bias decreases again to 1.3527. From the results, it can be observed that the bias decreases as the replicate increases. When the bootstrap replicate, B increases from 100 to 1000, the variance decreases from 125.2369 to 116.7295. It decreases further to 37.2858 when B is equal to 10000, which shows 70.23% difference when we compare to 125.2369.

Comparisons between Resampling Techniques in Linear Regression: A Simulation Study
Anwar Fitrianto 349

Residual Bootstrap Approach
The second resampling technique that has been used to analyze the data was residual bootstrap. This section displays some results such as parameter estimates, bias, and variances of the parameter estimates using residual bootstrap. The results of ̂0 and ̂1 are shown in Table 2. In residual bootstrap, the results were more apparent than in random bootstrap. It shows a clear trend of parameter estimates, bias, and variance at different sample sizes and the number of bootstrap replicates. The bias decrease as the sample size increases. When = 20, the bias is 0.2307. Then when the sample increased to 40 the bias became 0.2266 and bias is 0.0684 when the sample size is 50 and at last, when is 62 the bias became 0.01368. In general, there is a noticeable difference in bias when the sample size increases.  Then the variance decrease as the sample  size increases to 50 and 62 where the bias become 19.8861 and 15.4785, respectively. Now let's observe the changes in bias caused by the bootstrap replicate, B, when it is increased from hundred to thousand then ten thousand. For the estimated constant, ̂0 , when the sample size is 40 the bias changes from 2.6535 to 2.3838, then 2.2883 when B increases from 100 to 1000 then 10000, respectively. The variance also decreases when the bootstrap replicate increases.

Delete-one Jackknife Approach
The third technique that was used in this research is Jackknife delete-one. The method was applied with different sample sizes , which are 20, 40, 50 and 62. Table 3 and Figure  1 display the changes in bias of all parameters for delete-one Jackknife. The bias decreases as the sample size increases. But when sample size equal to the population size the bias shows an increasing state. Using the population as sample size might show this type of result. Plot of variance versus sample size for all parameters are shown in Figure 2. From the plot, it can be seen that the variance also shows a decreased state from sample 20 to sample 62. Small variances give a better estimation in linear regression. The bias and variance also not interrelated in delete-one Jackknife. The p-value also shows that all parameter estimates are significant. The standard error also clearly shows that the increase in sample size will give a better estimation.   The difference between residual bootstrap estimation and random bootstrap estimation is obvious when the sample size is 20 (small). The residual bootstrap provided better parameter estimation than random bootstrap in bias and variance. This shows that residual has a big influence in linear regression. But, as the sample size increases, both residual and random bootstrap methods show similar results. The increase in bootstraps replicates and sample size gave better parameter estimation in both methods. Jackknife delete-one gave a small variance, but the value of the bias was big when the sample size was small. The bias and variance decrease as the sample size increases.

CONCLUSIONS
Residual bootstrap, random bootstrap, and delete-one Jackknife were compared. Jackknife is not advisable to use when the sample size is small. However, when the sample

Comparisons between Resampling Techniques in Linear Regression: A Simulation Study
Anwar Fitrianto 352 size is big enough which is near to population size, it will give better parameter estimation than random bootstrap and residual bootstrap. In a situation where the sample size is small due to cost consideration, it is better to use residual bootstrap than other methods in linear regression. In conclusion, it is advisable to use residual bootstrap when the sample is small. The bigger bootstrap replicates will give better parameter estimation. The Jackknife can be used when the sample size is big enough. This method will be useful when the sample size is too big which may take time to process in both random and residual bootstrap.
In the future, this research can be extended to observe how these methods react when there is an outlier, influential point or leverage point. Moreover, the comparisons may involve other resampling techniques to compare which method works well in multiple linear regression.