Validity And Reliability Rubric Of Performance Assessment Geometry Study In Junior High School Using The Many Facet Rasch Model Approach

Rivo Panji Yudha

doi:10.24235/eduma.v9i2.7100

Authors

Rivo Panji Yudha Universitas 17 Agustus 1945 http://orcid.org/0000-0001-8833-5304

DOI:

https://doi.org/10.24235/eduma.v9i2.7100

Keywords:

Performance assessment, rubric, many-facet rasch measurement model (MFRM)

Abstract

The purpose of this study was to analyze the validity and reliability rubric of performance appraisal on geometry subject matter using the many facet rasch model approach through the Facets software. Data were collected from 100 small-scale students and 250 large-scale students in junior high schools using 3 raters. The performance assessment instrument in the form of a rubric is used to assess the student's process of working on the questions, each question has a different rubric. The Rasch Many Faceted Measurement Model (MFRM) is used to analyze data by looking at three aspects, namely the facet person, rater agreement, and difficulty domain using the Facet program. For the facet person, the rater separation ratio was 4.96, while the reliability of the separation index was 2.15 which indicates that the assessors are separated reliably. The stratum index is 3.21 which indicates that there are three strata of rater severity that differ statistically in the sample of these 4 raters. The rater agrement obtained the reliability of the rater separation of 0.87 and the correlation between each assessor and the other ranged between 0.40 and 0.63, indicating adequate agreement among the raters in assessing test participants with their level of competence. The Difficulty Domain on the variable maps shows that the hard to soft range is from about +1 to âˆ’1 logit.Â Â

Author Biography

Rivo Panji Yudha, Universitas 17 Agustus 1945

Sinta ID :Â 6195964Â Â

References

Andrade, H., & Du, Y. (2005). Student perspectives on rubric-referenced assessment. Practical Assessment, Research and Evaluation.

Ballantyne, R., Hughes, K., & Mylonas, A. (2002). Developing procedures for implementing peer assessment in large classes using an action research process. Assessment and Evaluation in Higher Education. https://doi.org/10.1080/0260293022000009302

Brennan, R. L., Robert L. Brennan, & Brennan, R. L. (2006). Educational Measurement. Fourth Edition. ACE/Praeger Series on Higher Education. In Praeger.

Diller, K. R., & Phelps, S. F. (2008). Learning outcomes, portfolios, and rubrics, oh my! Authentic assessment of an information literacy program. Portal. https://doi.org/10.1353/pla.2008.0000

Gonzalez, H. B., & Kuenzi, J. J. (2014). Science, technology, engineering, and mathematics (STEM) education: A primer. In Science, Technology, Engineering and Mathematics Education: Trends and Alignment with Workforce Needs.

Habron, G., Goralnik, L., & Thorp, L. (2012). Embracing the learning paradigm to foster systems thinking. International Journal of Sustainability in Higher Education. https://doi.org/10.1108/14676371211262326

Hafner, J. C., & Hafner, P. M. (2003). Quantitative analysis of the rubric as an assessment tool: An empirical study of student peer-group rating. International Journal of Science Education. https://doi.org/10.1080/0950069022000038268

Havarneanu, G. (2012). Standardized Educational Test for Diagnose the Development Level of Creative Mathematical Thinking Qualities. International Research Journal of Social Sciences.

Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it? Psychological Methods. https://doi.org/10.1037/1082-989X.5.1.64

Johnson, R., Penny, J., & Gordon, B. (2009). Assessing performance: Designing, scoring, and validating performance tasks. Journal of Educational Measurement.

Knight, L. A. (2006). Using rubrics to assess information literacy. Reference Services Review. https://doi.org/10.1108/00907320610640752

Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing. https://doi.org/10.1177/0265532208101008

Lawshe, C. H. (1975). A QUANTITATIVE APPROACH TO CONTENT VALIDITY. Personnel Psychology. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x

Macfarlane, B. (2011). The Morphing of Academic Practice: Unbundling and the Rise of the Para-academic. Higher Education Quarterly. https://doi.org/10.1111/j.1468-2273.2010.00467.x

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from personsâ€™ responses and performances as scientific inquiry into score meaning. American Psychologist. https://doi.org/10.1037/0003-066X.50.9.741

Myford, C. M., & Wolfe, E. W. (2003). Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part I. Journal of Applied Measurement.

Myford, C. M., & Wolfe, E. W. (2004). Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part II. Journal of Applied Measurement.

National Council of Teachers of Mathematics. (2014). Six Principles for School Mathematics. National Council of Teachers of Mathematics. https://doi.org/10.1111/j.1949-8594.2001.tb17957.x

Nordrum, L., Evans, K., & Gustafsson, M. (2013). Comparing student learning experiences of in-text commentary and rubric-articulated feedback: Strategies for formative assessment. Assessment and Evaluation in Higher Education. https://doi.org/10.1080/02602938.2012.758229

Oakleaf, M. (2009). Rubrics to assess information literacy: An examination of methodology and interrater reliability. Journal of the American Society for Information Science and Technology. https://doi.org/10.1002/asi.21030

Panadero, E., & Jonsson, A. (2013). The use of scoring rubrics for formative assessment purposes revisited: A review. In Educational Research Review. https://doi.org/10.1016/j.edurev.2013.01.002

Panadero, E., & Romero, M. (2014). To rubric or not to rubric? The effects of self-assessment on self-regulation, performance and self-efficacy. Assessment in Education: Principles, Policy and Practice. https://doi.org/10.1080/0969594X.2013.877872

Randler, C., Hummel, E., GlÃ¤ser-Zikuda, M., Vollmer, C., Bogner, F. X., & Mayring, P. (2011). Reliability and validation of a short scale to measure situational emotions in science education. International Journal of Environmental and Science Education.

Roever, C., & McNamara, T. (2006). Language testing: The social dimension. In International Journal of Applied Linguistics. https://doi.org/10.1111/j.1473-4192.2006.00117.x

Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin. https://doi.org/10.1037/0033-2909.88.2.413

Timmermana, B. E. C., Strickland, D. C., Johnson, R. L., & Paynec, J. R. (2011). Development of a â€œuniversalâ€ rubric for assessing undergraduatesâ€™ scientific reasoning skills using scientific writing. Assessment and Evaluation in Higher Education. https://doi.org/10.1080/02602930903540991

Tindal, G. (2012). Large-scale Assessment Programs for All Students. In Large-scale Assessment Programs for All Students. https://doi.org/10.4324/9781410605115

Valli, L., & Rennert-Ariev, P. (2002). New standards and assessments? Curriculum transformation in teacher education. Journal of Curriculum Studies. https://doi.org/10.1080/00220270110093625

Vogler, K. (2002). The Impact of High-Stakes, State-Mandated Student Performance Assessment on Teachersâ€™ Instructional Practices. Education.

Weir, J. P. (2005). Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. In Journal of Strength and Conditioning Research. https://doi.org/10.1519/15184.1

Wolfe, E. (2004). Identifying Rater Effects Using Latent Trait Models. Psychology Science.