Reliability Assessment of Arabic Speech Contest

Andhita Dessy Wulansari


This study's assessment model for the Arabic speech competition involved three competent judges. Each of them assessed the same three components, namely al fashahah (fluency), lubbul maudhu' (content/theme discussion), and al harakah (participant movement, including participant expressions). The subjects in this study were all 24 participants in the Arabic speech competition at IAIN Ponorogo in 2019. All of these participants came from MA around Madiun. It is possible to issue differences between scores given by the three judges, considering that the range of values for each component is between 50 to 100. Scoring with a reasonably wide range could affect the consistency of the assessment. It is crucial to estimate the reliability coefficient of the judges' assessment of the Arabic speech contest. This study uses a quantitative approach because the primary data are scores from the judges of the Arabic language competition. The reliability estimation uses a variance analysis approach whose procedure is based on generalizability theory (G-Theory) through the G-Study concept with a multifaceted design. This theory can improve the instrument's quality by testing several sources of variance to make decisions and the consistency of the results of the generalizability coefficient. The data analysis results conclude that the assessment instrument used in the Arabic speech competition in this study is reliable or still feasible to use. The feasibility is based on the validity and reliability of the instruments used. The analysis of some experts regarding the instrument content claims that it was valid. In addition, the reliability coefficient of the combined score of the Arabic speech competition assessment is 0.96708. Therefore, it concludes that the assessment instrument used in this study is reliable.


Reliability; Generalizability Theory; Speech Contest; Arabic

Full Text:



Abdurochman, A. (2017). Strategi Pembelajaran Kosakata Bahasa Arab Bagi Non Arab. 19(1), 63-83.

Allen, M. J., & Yen, W. M. (2001). Introduction to measurement theory: Waveland Press.

Azwar, S. (2010). Teori Sikap dan Pengukurannya. 155-157.

Azwar, S. (2012). Reliabilitas dan validitas.

Baker, E. L. (1997). Model‐based performance assessment. 36(4), 247-254.

Basrowi, S. (2012). Evaluasi Belajar Berbasis Kinerja. In: Bandung: Karya Putra Darwati.

Berk, R. A. (1986). Performance assessment: Methods & applications: Johns Hopkins University Press.

Briesch, A. M., Chafouleas, S. M., & Johnson, A. (2016). Use of generalizability theory within K–12 school-based assessment: A critical review and analysis of the empirical literature. 29(2), 83-107.

Briesch, A. M., Swaminathan, H., Welsh, M., & Chafouleas, S. M. (2014). Generalizability theory: A practical guide to study design, implementation, and interpretation. 52(1), 13-35.

Ebel, R., & Frisbie, D. (1986). Essentials of educational measurement. Englewood Cliffs, NJ: Prenctice-Hall. In: Inc.

Fauziana, A., & Wulansari, A. D. (2021). Analisis Kualitas Butir Soal Ulangan Harian di Sekolah Dasar dengan Model Rasch. 6(1), 10-19.

Fikri, S., Machmudah, U., Halimi, H., & Ibrahim, F. M. A. (2021). The Debate Strategy And Its Contribution To The Arabic Learner’s Competence/استراتيجيّة المناظرة و اسهامها على كفاءة المتعلّم اللغة العربية. 4(3).

Griffin, P. J., & Nix, P. (1991). Educational assessment and reporting: A new approach.

Guntur, W. (2012). Pengaruh person-organization fit, kepuasan kerja dan komitmen organisasi terhadap kinerja perawat. 1(1), 1-7.

Izza, L. N., Susilaningsih, E., & Harjito, H. (2014). Analisis Instrumen Performance Assessment dengan Metode Generalizability Coefficient pada Penilaian Keterampilan Dasar Laboratorium. 3(1).

Jimaa, S. (2011). The impact of assessment on students learning. 28, 718-721.

Linn, R. L. (1991). Measurement and Evaluation in Teaching. New York: Macmillan Publishing Company.

Mardapi, D. (2004). Penyusunan tes hasil belajar.

Mardapi, D. (2008). Teknik penyusunan instrumen tes dan nontes. In: Yogyakarta: Mitra Cendikia Press.

Mardapi, D. (2012). Pengukuran penilaian dan evaluasi pendidikan. 45.

Matt, G. E., Hovell, M. F., Zakarian, J. M., Bernert, J. T., Pirkle, J. L., & Hammond, S. K. (2000). Measuring secondhand smoke exposure in babies: the reliability and validity of mother reports in a sample of low-income families. 19(3), 232.

Nurbayan, D. R., Nurbayan, Y., & Falah, K. N. (2020). Grammatical Error of Arabic Language in Student Thesis Department of Education Arabic Language FBPS UPI/Kesalahan Nahwu Bahasa Arab Dalam Skripsi Mahasiswa Departemen Pendidikan Bahasa Arab FBPS UPI. 3(2).

Ramadani, F., & Baroroh, R. U. (2020). Strategies And Methods Of Learning Arabic Vocabulary/Strategi Dan Metode Pembelajaran Kosakata Bahasa Arab. 3(2).

Retnowati, T. H. (2012). The Development of Assessment Instrument for Elementary School Student Painting. 16(2), 492-510.

Stiggins, R. J. (1994). Student-centered classroom assessment: Merrill New York.

Suskie, L. (2018). Assessing student learning: A common sense guide: John Wiley & Sons.

Thorndike, R. L. (1982). Applied Psychometrics. Boston: Houghton Mifflin Company.

Williams, D. R., Roggenbuck, J. W., Patterson, M. E., & Watson, A. E. (1992). The variability of user-based social impact standards for wilderness management. 38(4), 738-756.

Woodward, J. A., & Joe, G. W. (1973). Maximizing the coefficient of generalizability in multi-facet decision studies. 38(2), 173-181.

Wulansari, A. D., Kumaidi, & Hadi, S. (2019). Two Parameter Logistic Model with Lognormal Response Time for Computer-Based Testing. 14(15).

Wulansari, A. D., Kumaidi, Hadi, S., Saleh, M., & Friyatmi. (2019). Detection of Students’ Interest With the Logistics Model. 8(2), 564-571.



  • There are currently no refbacks.