Robust PCA Using MCD and MM Estimators in MARS A Simulation Study
Abstract
Multivariate Adaptive Regression Splines (MARS) models nonlinear relationships through adaptive basis functions but remain sensitive to outliers in the predictor variables. Existing robust extensions of MARS primarily address response outliers, while the few studies integrating Robust Principal Component Analysis (RPCA) with MARS use RPCA only for dimension reduction without comparing robust estimators. This study evaluates RPCA as a robust predictor transformation and systematically compares two robust covariance estimators—the Minimum Covariance Determinant (MCD) and the MM-estimator—within the RPCA-MARS framework. A full factorial simulation with 100 replications per condition covered 45 conditions: five sample sizes (n = 50, 100, 200, 500, 1000), three outlier proportions (5%, 10%, 25%), and three MARS interaction levels (1, 2, 3) with eight predictor variables. Outliers were extreme values in a specified proportion of predictor observations. Performance was measured by Root Mean Square Error (RMSE). For analysis, the 45 conditions were collapsed into 15 scenarios by selecting the interaction level with the minimum RMSE for each sample size and outlier proportion. The MM estimator outperformed the MCD estimator in 8 of 15 scenarios, achieving lower RMSE under moderate-to-high outlier contamination (10%–25%) with moderate sample sizes (n = 100–500). MCD performed better in the remaining 7 scenarios: under low contamination (5%) at n ≤ 200 and n ≥ 1000, and across all contamination levels at n = 1000. MCD showed higher variability at small samples with moderate-to-high contamination, while MM produced tighter confidence intervals and lower standard deviations. Within the RPCA-MARS framework, MM is recommended for moderately sized, highly contaminated data, while MCD is preferable under low contamination or in large-scale settings.
Keywords
Full Text:
PDFReferences
[1] A. Araveeporn. “The Estimating Parameter and Number of Knots for Nonparametric Regression Methods in Modelling Time Series Data”. Modelling 5.4 (2024), pp. 1413–1434. DOI: https://doi.org/10.3390/modelling5040073.
[2] Jerome H. Friedman. “Multivariate Adaptive Regression Splines”. The Annals of Statistics 19.1 (Mar. 1991), pp. 1–67. DOI: https://doi.org/10.1214/aos/1176347963.
[3] N. Murat. “Outlier detection in statistical modeling via multivariate adaptive regression splines”. Communications in Statistics-Simulation and Computation 52.7 (2023), pp. 3379–3390. DOI: https://doi.org/10.1080/03610918.2021.2007400.
[4] M. Hubert, P. J. Rousseeuw, and K. Vanden Branden. “ROBPCA: A new approach to robust principal component analysis”. Technometrics 47.1 (2005), pp. 64–79. DOI: https://doi.org/10.1198/004017004000000563.
[5] A. M. Gad and M. E. Qura. “Regression estimation in the presence of outliers: A comparative study”. International Journal of Probability and Statistics 5.3 (2016), pp. 65–72. DOI: https://doi.org/10.5923/j.ijps.20160503.01.
[6] P. J. Rousseeuw and K. V. Driessen. “A Fast Algorithm for the Minimum Covariance Determinant Estimator”. Technometrics 41.3 (1999), pp. 212–223. DOI: https://doi.org/10.1080/00401706.1999.10485670.
[7] Peter Rousseeuw and Mia Hubert. “High-Breakdown Estimators of Multivariate Location and Scatter”. Robustness and Complex Data Structures: Festschrift in Honour of Ursula Gather. Berlin, Heidelberg: Springer, 2013. Chap. 4, pp. 49–66. DOI: https://doi.org/10.1007/978-3-642-35344-6_4.
[8] M. Mohammadi, M. K. Khorrami, A. Rezaei, H. Vatanparast, and M. M. K. Khorrami. “Robust principal component analysis-multivariate adaptive regression splines (rPCA-MARS) model for determining total acid number (TAN) and total base number (TBN) of crude oil samples using attenuated total reflectance fourier transform infrared (ATR-FTIR) spectroscopy”. Vibrational Spectroscopy 129 (2023), p. 103579. DOI: https://doi.org/10.1016/j.vibspec.2023.103579.
[9] Christophe Croux and Anne Ruiz-Gazen. “High breakdown estimators for principal components: the projection-pursuit approach revisited”. Journal of Multivariate Analysis 95.1 (2005), pp. 206–226. DOI: https://doi.org/10.1016/j.jmva.2004.08.002.
[10] M. B. Adiguzel and M. A. Cengiz. “Model selection in multivariate adaptive regressions splines (MARS) using alternative information criteria”. Heliyon 9.9 (2023), e19964. DOI: https://doi.org/10.1016/j.heliyon.2023.e19964.
[11] N. Shanty and M. K. Aidid. “Application of Multivariate Adaptive Regression Splines (MARS) to Model the Factors Affecting the Percentage of Poor Population in Indonesia”. VARIANSI: Journal of Statistics and Its Application on Teaching and Research 7.03 (2025), pp. 228–240. DOI: https://doi.org/10.35580/variansiunm518.
[12] P. Filzmoser and K. Nordhausen. “Robust Linear Regression for High-Dimensional Data: An Overview”. Wiley Interdisciplinary Reviews: Computational Statistics 13.4 (2021), e1524. DOI: https://doi.org/10.1002/wics.1524.
[13] H. Bulut. “Mahalanobis Distance Based on Minimum Regularized Covariance Determinant Estimators for High Dimensional Data”. Communications in Statistics—Theory and Methods 49.24 (2020), pp. 5897–5907. DOI: https://doi.org/10.1080/03610926.2020.1719420.
[14] T. W. Anderson. An Introduction to Multivariate Statistical Analysis. 3rd ed. Wiley Series in Probability and Statistics. Hoboken, New Jersey: John Wiley & Sons, Inc., 2003, p. 742. URL: https://www.scribd.com/document/747066403/T-W-Anderson-An-Introduction-to-Multivariate-Statistical-Analysis-Wiley-Series-in-Probability-and-Statistics-3rd-Edition-2003.
[15] Richard A. Johnson and Dean W. Wichern. Applied Multivariate Statistical Analysis. 6th ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2007. URL: https://mathematics.foi.hr/Applied%20Multivariate%20Statistical%20Analysis%20by%20Johnson%20and%20Wichern.pdf.
DOI: https://doi.org/10.18860/cauchy.v11i2.41547
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Uswatun Hasanah, Solimun Solimun, Atiek Iriany

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.






