Volume 16 (2025) Download Cover Page

Development and Validation of A Critical Thinking Assessment on Temperature and Heat for Secondary Physics Education

Article Number: e2025264  |  Available Online: June 2025  |  DOI: 10.22521/edupij.2025.16.264

Bangun Sartono , Widha Sunarno , Baskoro Adi Prayitno , Nurma Yunita Indriyanti

Abstract

Background/purpose. Critical thinking (CT) is fundamental in science education, but instruments to measure CT in specific domains, such as physics, are still limited. The present study aims to develop and validate the Critical Thinking Test in Temperature and Heat (CTTH), an instrument designed to measure critical thinking skills in the topic of temperature and heat in physics.

Methods. The development of CTTH refers to a common critical thinking test framework. The validation process involved expert review from seven experts in physics education, a small-scale pilot test with 33 students, and final validation with 720 secondary school students, ensuring the test items' clarity, relevance, and psychometric quality. Instrument reliability was measured using Cronbach's alpha for internal consistency and Fleiss' kappa for inter-rater reliability.

Results. The CTTH exhibits adequate reliability, with internal consistency and inter-rater reliability, confirming the instrument's effectiveness in measuring critical thinking skills in the context of physics. CTTH is a measurement instrument that can measure critical thinking skills related to temperature and heat in physics learning.

Conclusion. The CTTH instrument offers a resource for further research on the integration of critical thinking in physics education. Future research is suggested to expand the critical thinking assessment framework and answer research questions for wider application in science education.

Keywords: Critical thinking test, temperature and heat, instrument development, physics education

References

Affandy, H., Nugraha, D. A., Pratiwi, S. N., & Cari, C. (2021). Calibration for instrument argumentation skills on the subject of fluid statics using item response theory. Journal of Physics: Conference Series, 1842(1), 1–10. https://doi.org/10.1088/1742-6596/1842/1/012032

Affandy, H., Sunarno, W., Suryana, R., & Harjana. (2024). Integrating creative pedagogy into problem-based learning: The effects on higher order thinking skills in science education. Thinking Skills and Creativity, 53, Article 101575. https://doi.org/10.1016/j.tsc.2024.101575

Aiken, L. R. (1980). Content validity and reliability of single items or questionnaires. Educational and Psychological Measurement, 40(4), 955–959. https://doi.org/10.1177/001316448004000419

Alpizar, D., Vo, T., French, B. F., & Hand, B. (2022). Growth of critical thinking skills in middle school immersive science learning environments. Thinking Skills and Creativity, 46, Article 101192. https://doi.org/10.1016/j.tsc.2022.101192

Alwan, A. A. (2011). Misconception of heat and temperature among physics students. Procedia - Social and Behavioral Sciences, 12, 600–614. https://doi.org/10.1016/j.sbspro.2011.02.074

Arnold, M., & Millar, R. (1994). Children’s and lay adults’ views about thermal equilibrium. International Journal of Science Education, 16(4), 405–419. https://doi.org/10.1080/0950069940160403

Bajracharya, R. R., Emigh, P. J., & Manogue, C. A. (2019). Students’ strategies for solving a multirepresentational partial derivative problem in thermodynamics. Physical Review Physics Education Research, 15(2), Article 20124. https://doi.org/10.1103/PhysRevPhysEducRes.15.020124

Bernard, R. M., Zhang, D., Abrami, P. C., Sicoly, F., Borokhovski, E., & Surkes, M. A. (2008). Exploring the structure of the Watson-Glaser Critical Thinking Appraisal: One scale or many subscales? Thinking Skills and Creativity, 3(1), 15–22. https://doi.org/10.1016/j.tsc.2007.11.001

Bond, T. G., Yan, Z., & Heene, M. (2021). Applying the Rasch Model: Fundamental Measurement in the Human Sciences (4th ed.). Routledge. https://doi.org/10.4324/9781315814698

Cari, C., Pratiwi, S. N., Affandy, H., & Nugraha, D. A. (2020). Investigation of undergraduate student concept understanding on Hydrostatic Pressure using two-tier test. Journal of Physics: Conference Series, 1511(1). https://doi.org/10.1088/1742-6596/1511/1/012085

Cascella, C., Giberti, C., & Bolondi, G. (2020). An analysis of Differential Item Functioning on INVALSI tests, designed to explore gender gap in mathematical tasks. Studies in Educational Evaluation, 64, Article 100819. https://doi.org/10.1016/j.stueduc.2019.100819

Cetin-Dindar, A., & Geban, O. (2011). Development of a three-tier test to assess high school students’ understanding of acids and bases. Procedia - Social and Behavioral Sciences, 15, 600–604. https://doi.org/10.1016/j.sbspro.2011.03.147

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates. https://doi.org/10.4324/9780203771587

Collado, S., Evans, G. W., Corraliza, J. A., & Sorrel, M. A. (2015). The role played by age on children’s pro-ecological behaviors: An exploratory analysis. Journal of Environmental Psychology, 44, 85–94. https://doi.org/10.1016/j.jenvp.2015.09.006

Cvenic, M. K., Planinic, M., Susac, A., Ivanjek, L., Jelicic, K., & Hopf, M. (2022). Development and validation of the Conceptual Survey on Wave Optics. Physical Review Physics Education Research, 18(1), 10103. https://doi.org/10.1103/PhysRevPhysEducRes.18.010103

Danday, B. A., & Monterola, S. L. C. (2019). Effects of microteaching multiple-representation physics lesson study on pre-service teachers’ critical thinking. Journal of Baltic Science Education, 18(5), 692–707. https://doi.org/10.33225/jbse/19.18.692

Engelhard, G., & Wang, J. (2021). Rasch Models for Solving Measurement Problems: Invariant Measurement in the Social Sciences. SAGE Publications. https://doi.org/10.4135/9781071878675

Ennis, R. H. (1958). An appraisal of the watson-glaser critical thinking appraisal. Journal of Educational Research, 52(4), 155–158. https://doi.org/10.1080/00220671.1958.10882558

Facione, P. A. (2000). The disposition toward critical thinking: Its character, measurement, and relationship to critical thinking skill. Informal Logic, 20(1), 61–84. https://doi.org/10.22329/il.v20i1.2254

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378–382. https://doi.org/10.1037/h0031619

Hsu, F. H., Lin, I. H., Yeh, H. C., & Chen, N. S. (2022). Effect of Socratic Reflection Prompts via video-based learning system on elementary school students’ critical thinking skills. Computers and Education, 183, Article 104497. https://doi.org/10.1016/j.compedu.2022.104497

Kaltakci, D., Eryilmaz, A., & McDermott, L. C. (2016). Identifying pre-service physics teachers’ misconceptions and conceptual difficulties about geometrical optics. European Journal of Physics, 37(4), Article 045705.

Karaca-Atik, A., Meeuwisse, M., Gorgievski, M., & Smeets, G. (2023). Uncovering important 21st-century skills for sustainable career development of social sciences graduates: A systematic review. Educational Research Review, 39, Article 100528. https://doi.org/10.1016/j.edurev.2023.100528

Kassiavera, S., Suparmi, A., Cari, C., & Sukarmin, S. (2024). Application of Rasch Model in two-tier test for assessing critical thinking in physics education. Journal of Baltic Science Education, 23(6), 1227–1242. https://doi.org/10.33225/jbse/24.23.1227

Kaur, R., Mantri, A., Nagabhushan, P., & Singh, G. (2024). Rasch Computing Analysis of Two Tier Concept Inventory to Assess Engineering Students’ Conceptual Knowledge. SN Computer Science, 5(5), 643–656. https://doi.org/10.1007/s42979-024-02955-6

Khoiriza, I., Aminatun, T., Pramusinta, W., & Hujatulatif, A. (2021). Science learning and environment: Analysis of student’s scientific literacy based on Indonesia’s waste problem. Proceedings of the 6th International Seminar on Science Education, 541(Isse 2020), 775–779. https://doi.org/10.2991/assehr.k.210326.111

Kim, L., Imjai, N., Kaewjomnong, A., Dowpiset, K., & Aujirapongpan, S. (2025). Does experiential learning matter to strategic intuition skills of MBA students? Implications of diagnostic capabilities and critical thinking skills. International Journal of Management Education, 23(2), Article 101138. https://doi.org/10.1016/j.ijme.2025.101138

Kinoshita, H. (2022). Teaching of critical thinking skills by science teachers in Japanese primary schools. Journal of Baltic Science Education, 21(5), 801–816. https://doi.org/10.33225/jbse/22.21.801

Lawson, A. E. (1992). What do tests of “formal” reasoning actually measure? Journal of Research in Science Teaching, 29(9), 965–983. https://doi.org/10.1002/tea.3660290906

Leach, S. M., Immekus, J. C., French, B. F., & Hand, B. (2020). The factorial validity of the Cornell Critical Thinking Tests: A multi-analytic approach. Thinking Skills and Creativity, 37, Article 100676. https://doi.org/10.1016/j.tsc.2020.100676

Mafinejad, M. K., Arabshahi, S. K. S., Monajemi, A., Jalili, M., Soltani, A., & Rasouli, J. (2017). Use of Multi-Response format test in the assessment of medical students’ critical thinking ability. Journal of Clinical and Diagnostic Research, 11(9), LC10–LC13. https://doi.org/10.7860/JCDR/2017/24884.10607

Mundilarto, & Ismoyo, H. (2017). Effect of problem-based learning on improvement physics achievement and critical thinking of senior high school student. Journal of Baltic Science Education, 16(5), 761–779. https://doi.org/10.33225/jbse/17.16.761

Nurhuda, T., Rusdiana, D., & Setiawan, W. (2017). Analyzing students’ level of understanding on Kinetic theory of gases. Journal of Physics: Conference Series, 812, 12105. https://doi.org/10.1088/1742-6596/812/1/012105

Penfield, R. D., & Giacobbi, P. R. (2004). Applying a score confidence interval to Aiken’s item content-relevance index. Measurement in Physical Education and Exercise Science, 8(4), 213–225. https://doi.org/10.1207/s15327841mpee0804_3

Ridho, A. (2018). Does Multidimensionality Cause DIF? ANIMA Indonesian Psychological Journal, 33(2), 125. https://doi.org/10.24123/aipj.v33i2.1583

Sapia, P., Napoli, F., & Bozzo, G. (2022). The Lawson’s test for scientific reasoning as a predictor for University formative success: A prospective study. Education Sciences, 12(11), 1–15. https://doi.org/10.3390/educsci12110814

Sarigoz, O. (2012). Assessment of the high school students’ critical thinking skills. Procedia - Social and Behavioral Sciences, 46, 5315–5319. https://doi.org/10.1016/j.sbspro.2012.06.430

Stolk, J. D., Gross, M. D., & Zastavker, Y. V. (2021). Motivation, pedagogy, and gender: examining the multifaceted and dynamic situational responses of women and men in college STEM courses. International Journal of STEM Education, 8, 1–19. https://doi.org/10.1186/s40594-021-00283-2

Suwita, S., Saputro, S., Sajidan, S., & Sutarno, S. (2024). Assessing lower-secondary school students’ critical thinking skills in photosynthesis: A Rasch Model approach. Journal of Baltic Science Education, 23(6), 1278–1290. https://doi.org/10.33225/jbse/24.23.1278

Svedholm-Häkkinen, A. M., Forzani, E., Coiro, J., & Kiili, C. (2025). Online credibility evaluation skills in upper secondary students: The role of grade level, argument evaluation, and analytic thinking dispositions. Learning and Individual Differences, 118, Article 102640. https://doi.org/10.1016/j.lindif.2025.102640

Treagust, D. F. (1988). Development and use of diagnostic tests to evaluate students’ misconceptions in science. International Journal of Science Education, 10(2), 159–169. https://doi.org/10.1080/0950069880100204

Wind, S. A. (2019). Nonparametric Evidence of Validity, Reliability, and Fairness for Rater-Mediated Assessments: An Illustration Using Mokken Scale Analysis. Journal of Educational Measurement, 56(3), 478–504. https://doi.org/10.1111/jedm.12222

Zakwandi, R., Istiyono, E., & Dwandaru, W. S. B. (2024). A two-tier computerized adaptive test to measure student computational thinking skills. Education and Information Technologies, 29(7), 8579–8608. https://doi.org/10.1007/s10639-023-12093-w

Zehirlioglu, L., & Mert, H. (2020). Validity and reliability of the Heart Disease Fact Questionnaire (HDFQ): a Rasch measurement model approach. Primary Care Diabetes, 14(2), 154–160. https://doi.org/10.1016/j.pcd.2019.06.006