Restricted Maximum Likelihood Method as An Alternative Parameter Estimation in Heteroscedastic Regression

Students are part of the community who have an income. The income of student is pocket money, scholarships, part-time jobs and so forth. They are trying to become trendsetter for their dress style. The consumption patterns are very influential in the behavior of saving. If the savings increases, not only the public funds will increase but also the investment. If the investment increases, the economic growth will also increase. The purpose of this research is to estimate multiple regression parameters using REML methods in modeling the student’s saving in Faculty of Mathematics and Natural Science, Brawijaya University. The variables used were: the student’s age, the amount of income of student’s parent, the amount of student’s pocket money, the amount of student’s additional income, the amount of student’s consumption and the amount of student’s saving. REML method can overcome heteroscedasticity of error variance and provide unbiased estimator. The model of student’s saving using REML method is as follows: ?̂?i = −1609 + 112 X1 + 0.0088X2 + 0.0504 X3 + 0.4706 X4 − 0.636X5 Student’s saving is affected significantly by: student’s age (X1), the amount of student’s additional income (X4), and the amount of student’s consumption (X5).


INTRODUCTION
The regression analysis is used to create a functional model of the data to explain or predict a natural phenomenon based on the other phenomena.Regression analysis was introduced by Sir Francis Galton in 1822-1911.The purpose of regression analysis is for prediction based on the relationship between the predictor variables and the response variables [1].Based on the shape of the relationship, the regression analysis can be divided into linear regression and non-linear regression.Linear regression is an approach for modeling the relationship between a dependent variable y and one or more explanatory variables (or independent variables) denoted by X. Parameter estimation methods which are often used in multiple linear regression is Ordinary Least Squares (OLS).The OLS method minimizes the sum squared of residuals (error).The OLS method require some classical assumptions in order to achieve estimator which is Best Linear Unbiased Estimator (BLUE).The assumptions related to errors that is generated from that model.The assumptions that must be met, namely the normality of error, nonautocorrelation, homoscedasticity, and non-multicollinearity.
Homoscedasticity is one of the important assumptions in the regression analysis, where the variance of the error term is constant otherwise heteroscedasticity.The effect of heteroscedasticity will give much weight to a small subset of data (namely the subset where the error variance is largest) when estimating regression parameters.Restricted Maximum Likelihood (REML) is known as an unbiased parameter estimation method.REML method can be applied to models that have a normal experimental error, interrelated and different variance.The used of REML variance component estimation can be done even if the data did not meet the assumptions of analysis of variance [2].
Economics is one of social field that is often use regression analysis to make decisions.One of the developed economic theory is consumption theory.Consumption theory states that any individual who has income, is assumed to set aside their part of revenue after being deducted by consumption [3].
The consumption pattern is significantly affecting the saving's behavior.Indonesian society is known as a consumer society, it could lead to the low motivation of savings.The benefits of savings are degrading consumerist patterns, practicing thrift and as a reserve fund.If the savings increases, not only the public funds will increase but also the investment [4].Students are part of the community who have an income.
The purpose of this research to estimate multiple regression parameters using REML methods in modeling student's saving at the Faculty of Mathematics and Natural Science, Brawijaya University.

METHODS
The parameters used in this study consisted of a response variable and five predictor variables.The response variable is student's saving (Y).Five predictor variables that affect student savings and used in the research are:  1 = The student's age (years),  2 = The amount of income of student's parent (thousand rupiah),  3 = The amount of student's pocket money (thousand rupiah),  4 = The amount of student's additional income (thousand rupiah) and  5 = The amount of student's consumption (thousand rupiah).
Linear regression analysis is a statistical method that is useful to model the relationship between the response variable and predictor variables.The relationships model derived from regression analysis can be used as a description of the phenomenon of data.The regression model can also be used for predicting the values of the response variable.The concept of predicting in the regression analysis can only be done in the data range of the predictor variables used to establish the regression model [5] = error  = number of predictor variables.Equation ( 1) has ( + 1) unknown parameters, with { 1 , . . .,   ,  = 1, ⋯ , } is assumed fix and {  } assumed variables are independent, normal distribution with average 0 and variance  2 : Using a matrix of the equation ( 1) can be denoted: (2) where  = respons vector of size ( × 1)  = predictor matrix size ( × ( + 1))  = regression coefficient measuring ((p + 1) × 1)  = error vector size (n × 1) The steps of data analysis are as follows: 1. Estimate the parameters by using the OLS.Ordinary Least Squares method is one of the parameter estimations in regression analysis by minimizing the sum of squared errors.By using OLS, the obtained estimators for the parameter β is  ̂.Based on the model ( 2), it is obtained: By using the properties of the inverse matrix,      =    is a scalar, then the least squares estimators must meet:   | ̂= −   +    ̂= 0 be simplified,    ̂=    (4) Multiply the final form of the matrix equation (4) both sides with (  ) − , produces the least squares estimator for β is: (  ) −    ̂= (  ) −     ̂= (  ) −     ̂= (  ) −    (5) 2. Test the classical assumption of multiple linear regression analysis.
The model derived from multiple regression analysis must meet the assumptions of the classical regression analysis.The assumptions include: error normally distributed error, homoscedasticity of error variance, non-autocorrelation and non-multicollinearity.The normality assumption of error is an error value (  ) obtained from the regression model should follow the normal distribution.One of the methods to detect normality of error is Shapiro-Wilk [6] .
The hypotheses tested: where G Value can be approximated by the normal distribution as the Z value is the value of the coefficient counting.The value of   is Shapiro-Wilk's value with certain n.Value   ,   , and   is the conversion value Shapiro-Wilk statistical approaches a normal distribution for n (many observations), If G value less than the critical value of Z distribution, then it can be decided to accept H0, which means that the experimental error is normally distributed [7].
One of the assumptions of classical regression model is homoscedasticity [8].If the variance is not constant, is expressed as heteroscedasticity.One of the methods to detect the presence of heteroscedasticity is by using Glejser test.After getting   from regression with OLS method, Glejser suggest regressing the absolute   as a response to the predictor variables based on the hypotheses: 1 : At least one j where σ  2 ≠  2 If  0 true, the test statistic where: If the test statistic is less than the critical point  (,(−−1)) , then it is decided to accept H0, which means that the error variance is homogeneous [8] Autocorrelation is the correlation between members of a series of observations which are sorted by time (time series) or space (data cross-section).To detect the presence of autocorrelation, the Durbin Watson's test was used based on:  0 :  = 0 (Error are independent)  1 ::  ≠ 0 (Error are not independent) Statistical test: where:  : Durbin Watson statistic   : the  − ℎ error value  −1 : the ( − 1) error value  0 rejected if  <    > 4 −   0 acceptable if  <  < 4 −  No decision if dL <d <dU or 4-dU <d <4-dL One of the assumptions that must be met in the establishment of regression model with multiple variables predictor is non-multicollinearity.Multicollinearity is the presence of high linear relationship between the predictor variables.Multicollinearity could be detected by using Variance Inflation Factor (VIF) [9], based on hypotheses:  0 : No multicollinearity between variables  1 : There multicollinearity between variables  = +   ;  =  = 1,2, ⋯ ,  (9) If VIF is less than 10 then H0 accepted, so the assumption of non-multicollinearity is met.
3. If the assumptions are not met homoscedasticity, then if is followed by suspected regression parameter using REML.
Restricted Maximum Likelihood is an alternative variance estimation parameter derived from the Maximum Likelihood Method (MLM) [2].The parameters obtain from REML estimators is divided into two parts, namely fixed effects parameter by parameter  and  2 .As an example of a random sample that has a normal distribution, then: With the μ=Xβ so:  6) is a function of (p + 1) parameters (β).For this function free of the parameter β can be written:

RESULTS AND DISCUSION Estimation of Regression Parameter Using OLS
Linear regression analysis is a statistical method that is useful to model the relationship between response variable and predictor variables [5].The relationships derived from regression analysis that can be used as a descript of the phenomenon of data.In this study, the model obtained using Ordinary Least Squares (OLS) Method is: ̂ = −1855 + 121,5 1 + 0,0098 2 + 0.0524  3 + 0.559  4 − 0.585 5

Testing of The Classical Assumption of Multiple Linear Regression Analysis
Testing the assumption of normality of errors using the Shapiro-Wilk test based on the hypothesis:  0 ∶ An error is normally distributed  1 ∶ An error is not normally distributed The p-value of 0.294 for the test and the Shapiro Wilk test statistic G of 0.9784.Based on testing criteria, because the value of p >  and statistic's test G < critical point Z (1,96) , H0 accepted and concluded that the error distributes normally with a confidence level of 95%.

Estimation of Regression Parameter Using REML
Restricted Maximum Likelihood (REML) is an alternative variance estimation parameter derived from the Maximum Likelihood Method (MLM) [2]

Model Validation
To find out whether the model obtained in the study is in accordance with the actual conditions in the field, the model validation is performed.Then, the comparison between the predicted values from REML method with the actual value of observations is tested using paired t test.The summary result of paired t test is presented in Table 4 3 decided to accept H0 because p value is larger than 0.05, lead to conclusion that the model of REML method can be used to predict student's saving.

Table 1 .
[9]or is not independent) D test statistic of 1.928.According to the Durbin Watson table then obtained a value of 1.464 dL and dU value of 1.768.The test of the statistic is between the value d and 4-dU dU then H0 accepted, can be conclude non-autocorrelation assumptions are met.The Variance Inflation Factor (VIF) is one of the values used to detect the presence of multicollinearity[9]. Hypotheses were tested: 0 ∶ Non multicollinierity  1 ∶ Multicollinierity VIF value of each predictor variable

.
The outlinesof the parameters REML estimators into two parts, namely fixed effects parameter by parameter  and  2 .The model obtained using Restricted Maximum Likelihood (REML) Method is:  ̂ = −1609 + 112  1 + 0.0088 2 + 0.0504  3 + 0.4706  4 − 0.636 5 The results of testing each parameter using Wald test is shown in Table 4.2

Table 2 .
The result of partial test use REML method

Table 3 .
.3.Paired t Test predicted value from REML method and actual value