Deep-Rasch as an Alternative to Rasch Modeling under Assumption Violations and Small Sample Sizes
Abstract
Keywords
Full Text:
PDFReferences
[1] C. D. Desjardins and O. Bulut, Handbook of Educational Measurement and Psychometrics Using R. Boca Raton, FL: CRC Press, 2018.
[2] R. K. Hambleton and R. W. Jones, “Comparison of classical test theory and item response theory and their applications to test development,” Educational Measurement: Issues and Practice, vol. 12, no. 3, pp. 38–47, 1993. doi: 10.1111/j.1745-3992.1993.tb00543.x.
[3] I. Rafi, H. Retnawati, E. Apino, D. Hadiana, I. Lydiati, and M. N. Rosyada, “What might be frequently overlooked is actually still beneficial: Learning from post national-standardized school examination,” Pedagogical Research, vol. 8, no. 1, pp. 1–15, 2023. doi: 10.29333/pr/12657.
[4] S. E. Stemler and A. Naples, “Rasch measurement v. item response theory: Knowing when to cross the line,” Practical Assessment, Research & Evaluation, vol. 26, no. 11, pp. 1–16, 2021. doi: 10.7275/V2GD-4441.
[5] C. Zanon, C. S. Hutz, H. Yoo, and R. K. Hambleton, “An application of item response theory to psychological test development,” Psicologia: Reflexão e Crítica, vol. 29, no. 1, pp. 1–10, 2016. doi: 10.1186/s41155-016-0040-x.
[6] T. Pardede et al., “Gaining a deeper understanding of the meaning of the carelessness parameter in the 4PL IRT model and strategies for estimating it,” REiD (Research and Evaluation in Education), vol. 9, no. 1, pp. 86–117, 2023. doi: 10.21831/reid.v9i1.63230.
[7] A. Santoso, “Karakteristik butir tes pengantar statistika sosial berdasarkan teori respon butir,” Jurnal Pendidikan Matematika dan Sains, vol. 6, no. 2, pp. 158–168, 2018. doi: 10.21831/jpms.v6i2.23959.
[8] A. Santoso et al., “From investigating the alignment of a priori item characteristics based on the CTT and four-parameter logistic (4-PL) IRT models to further exploring the comparability of the two models,” Practical Assessment, Research & Evaluation, vol. 29, no. 14, pp. 1–28, 2024. doi: 10.7275/pare.2043.
[9] L. Tesio, A. Caronni, D. Kumbhare, and S. Scarano, “Interpreting results from Rasch analysis 1. The ‘most likely’ measures coming from the model,” Disability and Rehabilitation, vol. 46, no. 3, pp. 591–603, 2024. doi: 10.1080/09638288.2023.2169771.
[10] A. Robitzsch, “On the choice of the item response model for scaling PISA data: Model selection based on information criteria and quantifying model uncertainty,” Entropy, vol. 24, no. 6, pp. 1–26, 2022. doi: 10.3390/e24060760.
[11] P. W. van Rijn, S. Sinharay, S. J. Haberman, and M. S. Johnson, “Assessment of fit of item response theory models used in large-scale educational survey assessments,” Large-scale Assessments in Education, vol. 4, no. 1, pp. 1–23, 2016. doi: 10.1186/s40536-016-0025-3.
[12] A. Rahim, S. Hadi, D. Susilowati, Marlina, and Muti’ah, “Developing of computerized adaptive test (CAT) based on a learning management system in mathematics final exam for junior high school,” International Journal of Educational Reform, 2023. doi: 10.1177/10567879231211297.
[13] M. D. Reckase, “The influence of computerized adaptive testing (CAT) on psychometric theory and practice,” Journal of Computerized Adaptive Testing, vol. 11, no. 1, pp. 1–12, 2024. doi: 10.7333/2403-1101001.
[14] R. Rukli and N. A. Atan, “Simulation of low-high method in adaptive testing,” REiD (Research and Evaluation in Education), vol. 10, no. 1, pp. 35–49, 2024. doi: 10.21831/reid.v10i1.66922.
[15] A. Santoso, K. Kartianom, and G. K. Kassymova, “Kualitas butir bank soal statistika (studi kasus: Instrumen ujian akhir mata kuliah statistika Universitas Terbuka),” Jurnal Riset Pendidikan Matematika, vol. 6, no. 2, pp. 165–176, 2019. doi: 10.21831/jrpm.v6i2.28900.
[16] B. Kartowagiran, S. Munadi, H. Retnawati, and E. Apino, “The equating of battery test packages of mathematics national examination 2013–2016,” in SHS Web of Conferences, A. G. Abdullah, J. Foley, I. G. N. A. Suryaputra, and A. Hellman (Eds.), Bali, Indonesia: EDP Sciences, 2018, pp. 1–6. doi: 10.1051/shsconf/20184200022.
[17] H. Retnawati, “Perbandingan metode penyetaraan skor tes menggunakan butir bersama dan tanpa butir bersama,” Jurnal Kependidikan: Penelitian Inovasi Pembelajaran, vol. 46, no. 2, pp. 164–178, 2016. doi: 10.21831/jk.v46i2.10383.
[18] E. Yusron, H. Retnawati, and I. Rafi, “Bagaimana hasil penyetaraan paket tes USBN pada mata pelajaran matematika dengan teori respons butir?” Jurnal Riset Pendidikan Matematika, vol. 7, no. 1, pp. 1–12, 2020. doi: 10.21831/jrpm.v7i1.31221.
[19] M. Kleinman and J. A. Teresi, “Differential item functioning magnitude and impact measures from item response theory models,” Psychological Test and Assessment Modeling, vol. 58, no. 1, pp. 79–98, 2016.
[20] H. Penton, C. Dayson, C. Hulme, and T. Young, “An investigation of age-related differential item functioning in the EQ-5D-5L using item response theory and logistic regression,” Value in Health, vol. 25, no. 9, pp. 1566–1574, 2022. doi: 10.1016/j.jval.2022.03.009.
[21] A. Santoso et al., “The effect of scoring correction and model fit on the estimation of ability parameter and person fit on polytomous item response theory,” REiD (Research and Evaluation in Education), vol. 8, no. 2, pp. 140–151, 2022. doi: 10.21831/reid.v8i2.54429.
[22] D. Andrich and I. Marais, A Course in Rasch Measurement Theory: Measuring in the Educational, Social and Health Sciences. Singapore: Springer Nature Singapore, 2019. doi: 10.1007/978-981-13-7496-8.
[23] T. G. Bond, Z. Yan, and M. Heene, Applying the Rasch Model: Fundamental Measurement in the Human Sciences, 4th ed. New York, NY: Routledge, 2021.
[24] R. K. Hambleton and H. Swaminathan, Item Response Theory: Principles and Applications. New York, NY: Springer Science+Business Media, 1985.
[25] P. Mair, Modern Psychometrics with R. Cham, Switzerland: Springer International Publishing, 2018.
[26] S. A. Wind and C. Hua, Rasch Measurement Theory Analysis in R. Boca Raton, FL: CRC Press, 2022. doi: 10.1201/9781003174660.
[27] H. Retnawati, Kumaidi, G. K. Kassymova, M. Socheath, and O. Ndayizeye, “How many dimensions and items were enough for mathematics test in national examination? (Application of multidimensional logistic model in item response theory),” in AIP Conference Proceedings, Yogyakarta, Indonesia, 2022, p. 050012. doi: 10.1063/5.0111066.
[28] M. D. Barrett and W. J. van der Linden, “Estimating linking functions for response model parameters,” Journal of Educational and Behavioral Statistics, vol. 44, no. 2, pp. 180–209, 2019. doi: 10.3102/1076998618808576.
[29] A. Şahin and D. Anıl, “The effects of test length and sample size on item parameters in item response theory,” Educational Sciences: Theory & Practice, vol. 17, no. 1, pp. 321–335, 2017. doi: 10.12738/estp.2017.1.0270.
[30] P. Sopa, P. Tuksino, and P. Makmee, “The appropriate sample size for test length on estimating item parameters in item response theory,” Journal of Research Methodology, vol. 36, no. 2, pp. 144–163, 2023.
[31] J. M. Linacre, “Sample size and item calibration [or person measure] stability,” Rasch Measurement Transactions, vol. 7, no. 4, p. 328, 1994.
[32] T. R. O’Neill, J. L. Gregg, and M. R. Peabody, “Effect of sample size on common item equating using the dichotomous Rasch model,” Applied Measurement in Education, vol. 33, no. 1, pp. 10–23, 2020. doi: 10.1080/08957347.2019.1674309.
[33] E. Tsutsumi, R. Kinoshita, and M. Ueno, “Deep item response theory as a novel test theory based on deep learning,” Electronics, vol. 10, no. 9, pp. 1–19, 2021. doi: 10.3390/electronics10091020.
[34] E. Tsutsumi, R. Kinoshita, and M. Ueno, “Deep-IRT with independent student and item networks,” in Proceedings of the 14th International Conference on Educational Data Mining, I.-H. Hsiao, S. Sahebi, F. Bouchet, and J.-J. Vie (Eds.), Paris, France: International Educational Data Mining Society, 2021, pp. 510–517.
[35] A. Maydeu-Olivares and R. Montaño, “How should we assess the fit of Rasch-type models? Approximating the power of goodness-of-fit statistics in categorical data analysis,” Psychometrika, vol. 78, no. 1, pp. 116–133, 2013. doi: 10.1007/s11336-012-9293-1.
[36] D. Andrich, Rasch Models for Measurement. Sage Publications, 1988.
[37] G. Rasch, Probabilistic Models for Some Intelligence and Attainment Tests. University of Chicago Press, 1980.
[38] B. Hayat, “Adjustment for guessing in a basic statistics test for Indonesian undergraduate psychology students using the Rasch model,” Cogent Education, vol. 9, no. 1, pp. 1–17, 2022. doi: 10.1080/2331186X.2022.2059044.
[39] J. Hattie, “Methodology review: Assessing unidimensionality of tests and items,” Applied Psychological Measurement, vol. 9, no. 2, pp. 139–164, 1985. doi: 10.1177/014662168500900204.
[40] S. L. Slocum-Gori and B. D. Zumbo, “Assessing the unidimensionality of psychological scales: Using multiple criteria from factor analysis,” Social Indicators Research, vol. 102, no. 3, pp. 443–461, 2011. doi: 10.1007/s11205-010-9682-8.
[41] W. M. Yen, “Effects of local item dependence on the fit and equating performance of the three-parameter logistic model,” Applied Psychological Measurement, vol. 8, no. 2, pp. 125–145, 1984. doi: 10.1177/014662168400800201.
[42] W.-H. Chen and D. Thissen, “Local dependence indexes for item pairs using item response theory,” Journal of Educational and Behavioral Statistics, vol. 22, no. 3, pp. 265–289, 1997. doi: 10.3102/10769986022003265.
[43] J. M. Linacre, “What do infit and outfit, mean-square and standardized mean?” Rasch Measurement Transactions, vol. 16, no. 2, p. 878, 2002.
[44] B. D. Wright and J. M. Linacre, “Reasonable mean-square fit values,” Rasch Measurement Transactions, vol. 8, no. 3, p. 370, 1994.
[45] V. Aryadoust, L. Y. Ng, and H. Sayama, “A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research,” Language Testing, vol. 38, no. 1, pp. 6–40, 2021. doi: 10.1177/0265532220927487.
[46] F. Quansah, “Item and rater variabilities in students’ evaluation of teaching in a university in Ghana: Application of many-facet Rasch model,” Heliyon, vol. 8, no. 12, e12548, 2022. doi: 10.1016/j.heliyon.2022.e12548.
[47] M. Tavakol and R. Dennick, “Psychometric evaluation of a knowledge based examination using Rasch analysis: An illustrative guide: AMEE Guide No. 72,” Medical Teacher, vol. 35, no. 1, pp. 838–848, 2013. doi: 10.3109/0142159X.2012.737488.
[48] Ş. K. Çorbacıoğlu and G. Aksel, “Receiver operating characteristic curve analysis in diagnostic accuracy studies: A guide to interpreting the area under the curve value,” Turkish Journal of Emergency Medicine, vol. 23, no. 4, pp. 195–198, 2023. doi: 10.4103/tjem.tjem_182_23.
[49] R Core Team, R: A language and environment for statistical computing, Online; accessed 2024, R Foundation for Statistical Computing, Vienna, Austria, 2024. Available online.
[50] Posit team, RStudio: Integrated development environment for R, Online; accessed 2024, Posit Software, PBC, Boston, MA, 2024. Available online.
[51] P. Schauberger and A. Walker, openxlsx: Read, write and edit xlsx files, R package version 4.2.5.2, 2023. Available online.
[52] W. Revelle, psych: Procedures for psychological, psychometric, and personality research, R package version 2.4.6.26, Evanston, IL, 2024. Available online.
[53] B. P. O’Connor, EFA.dimensions: Exploratory factor analysis functions for assessing dimensionality, R package version 0.1.8.4, 2024. Available online.
[54] R. P. Chalmers, “mirt: A multidimensional item response theory package for the R environment,” Journal of Statistical Software, vol. 48, no. 6, pp. 1–29, 2012. doi: 10.18637/jss.v048.i06.
[55] B. Hamner and M. Frasco, Metrics: Evaluation metrics for machine learning, R package version 0.1.4, 2018. Available online.
[56] J. J. Allaire and T. Yuan, tensorflow: R interface to “TensorFlow”, R package version 2.16.0, 2024. Available online.
[57] T. Kalinowski, J. J. Allaire, and F. Chollet, keras: R interface to “Keras”, R package version 2.15.0, 2024. Available online.
[58] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, 2015, pp. 1–15. doi: 10.48550/arXiv.1412.6980.
[59] H. Akoglu, “User’s guide to correlation coefficients,” Turkish Journal of Emergency Medicine, vol. 18, no. 3, pp. 91–93, 2018. doi: 10.1016/j.tjem.2018.08.001.
[60] W.-H. Chen, W. Lenderking, Y. Jin, K. W. Wyrwich, H. Gelhorn, and D. A. Revicki, “Is Rasch model analysis applicable in small sample size pilot studies for assessing item characteristics? An example using PROMIS pain behavior item bank data,” Quality of Life Research, vol. 23, no. 2, pp. 485–493, 2014. doi: 10.1007/s11136-013-0487-5.
DOI: https://doi.org/10.18860/cauchy.v10i2.36276
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Agus Santoso, Farit Mochamad Afendi, Timbul Pardede, Heri Retnawati, Ibnu Rafi, Ezi Apino, Munaya Nikma Rosyada

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.







