The Accuracy of Estimating Parameters of Multiple-Choice Test Items, Following Item-Response Theory: A Simulation Study
Article Number: e2025054 | Published Online: February 2025 | DOI: 10.22521/edupij.2025.14.54
Aiman Mohammad Freihat , Omar Saleh Bani Yassin
Full text PDF |
210 |
128
Abstract
Background/purpose. This study aimed to reveal the accuracy of estimation of multiple-choice test items parameters following the models of the item-response theory in measurement. Materials/methods. The researchers depended on the measurement accuracy indicators, which express the absolute difference between the estimated and actual values of the parameters of the items. The researchers depended on the square root of the error's mean squares and their relative efficiency (RE). (1500) responses were generated under the assumption of a normal distribution, following the ability parameter. Several tests comprising (50) items each were generated under the assumption of distributions (normal for difficulty, regular for discrimination, regular for guessing), assuming that the tests are multiple-choice, using the Wingen V data generation V.3 program. The BILOG-MG software was used to estimate the item's parameters using the marginal maximum likelihood method. Then, the estimated parameters were compared to the actual parameters using two indicators (absolute difference, the square root of the squares mean of the error, and the relative efficiency index of the variances of the estimated parameters). Results. The study results showed that the three-parameter model was more accurate in estimating the difficulty parameter, followed by the single-parameter model and then the two-parameter model. |
Conclusion. The results showed that the three-parameter model was more accurate than the two-parameter model. Also, the results showed the guessing parameter is only related to the three-parameter model. The estimated guessing parameter was more accurate in the five-alternative tests, followed by the three-alternative tests and then the four-alternative tests.
Keywords: Estimation, item parameter, item-response theory, models, multiple-choice test
ReferencesAiken, J. (1982). Testing with multiple-choice items. Journal of Development in Education, 20, 44–57.
Allam, S. E.-D. (2000). Educational and psychological measurement and evaluation: Its basics, applications, and contemporary trends. Cairo: Dar Al-Fikr Al-Arabi.
Allam, S. E.-D. M. (2005). One-dimensional and multidimensional test-item response models and their applications in psychological and educational measurement. Cairo: Dar Al-Fikr Al-Arabi.
Al-Rabba’i, I. M. (2012). The effect of the number of alternatives and changing the position of the strong camouflage in multiple-choice items on the psychometric properties of the test, the parameters of the items, and the abilities of individuals (Unpublished master’s thesis). Yarmouk University, Jordan.
Al-Sharifain, N., & Taamneh, I. (2009). The effect of the number of alternatives in a multiple-choice test on the ability estimates of individuals and the psychometric properties of the items and the test following the Rasch model in the item response theory. Jordanian Journal of Educational Sciences, 5(4), 309-335.
Blunch, N. J. (1984). Positional bias in multiple-choice questions. Journal of Marketing Research, 21, 216–220. https://doi.org/10.1177/002224378402100210
Crehan, K., Haladyna, T., & Brewer, B. W. (1993). Use of an inclusive option and the optimal number of options for multiple-choice items. Educational and Psychological Measurement, 53, 241–247. https://doi.org/10.21449/ijate.421167
Frisbie, D. A., & Sweeney, D. S. (1987). The relative merits of multiple true-false achievement tests. Journal of Educational Measurement, 19, 29–35. https://doi.org/10.1111/j.1745-3984.1982.tb00112.x
Fu, Q. (2010). Comparing the accuracy of parameter estimation using IRT models in the presence of guessing (Unpublished PhD dissertation). Illinois University, USA.
Gay, L. R. (1980). Educational evaluation and measurement: Competencies for analysis and application. Ohio: Charles E. Merrill Publishing Company.
Hambleton, R. K., & Swaminathan, H. (1991). Item-response theory: Principles and applications. Boston: Kluwer Nijhoff Publishing.
Jasper, F. (2010). Applied dimensionality and test structure assessment with the START-M mathematics test (Unpublished doctoral dissertation). The International Journal of Educational and Psychological Assessment, 6(1), University of Mannheim, Germany.
Lord, F. M. (1980). Application of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Macdonald, P., & Paunonen, S. V. (2002). A Monte Carlo comparison of items & statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921–943. https://doi.org/10.1177/0013164402238082
Miller, T. (1991). Empirical estimation of standard errors of compensatory MI model parameters obtained from the NOHARM estimation program (ACT Research Report No. onr91-2). Iowa City, IA: ACT Inc.
Nitko, A. J. (2001). Educational assessment of students (3rd ed.). New Jersey: Prentice-Hall, Inc.
Odah, A. S. (2014). Measurement and evaluation in the teaching process. Irbid: Dar Al-Amal for Publishing and Distribution.
Reckase, M. D. (1978). A comparison of the one and three-parameter logistic models for item calibration. Paper presented at the annual meeting of the American Educational Research Association, Toronto, Canada. Available at http://eric.ed.gov
Yaman, S. (2011). The optimal number of choices in multiple-choice tests: Some evidence for science and technology education. The New Educational Review, 23(1), 227–241.