The Impact of Item Pool Size and Item Pool Distribution on Student Ability Estimates for a Hybrid Interim-Summative CAT

Authors

  • Garron Gianopulos NWEA
  • Jonghwan Lee NWEA
  • Sangdon Lim University of Texas at Austin
  • Luping Niu University of Texas at Austin
  • Sooyong Lee University of Texas at Austin
  • Seung W. Choi University of Texas at Austin

Abstract

This paper investigates the impact of a uniform versus a bell-shaped distribution of items in a computer adaptive pool within and across administrations for a hybrid interim-summative assessment. Item pool sizes of 500, 800, and 1500 were simulated for both distributions. One-hundred simulations were conducted for two grades (Grade 4 and Grade 6) in mathematics. The item pools were generated under a Rasch model for dichotomous items and a partial credit model for three-category items. Each item pool was simulated to be vertically scaled and vertically articulated across grades. The items in the pools were generated to align to the blueprints for a state test, and the targeted distribution of items across the four performance levels based on the notion engineered score interpretations were implemented. For the two item pools under the normal distribution, the difficulties for Grades 4 and 6 were drawn from a normal distribution with means of -0.40 and 0.40 and standard deviation of 1.1. For the two item pools under the uniform distribution, the difficulties were drawn from a uniform distribution with differing minimums and maximums for each grade: -3.6 to -2.8 for the minimums and 2.4 to 3.2 for the maximums. The outcome variables investigated were measurement precision (i.e., root mean square error (RMSE)), measurement accuracy (i.e. bias), item pool adaptivity, classification accuracy (true positives, true negatives, false positives, and false negatives), and item exposure rate. Either item distribution largely worked well with slightly improved results for larger banks. Item pool sizes of 800 did not perform materially differently than pool sizes of 1500. In general, while all three administrations had robust findings with the outcome variables, the measurement quality degraded slightly in the 500 item pool. Implications and trade-offs in item pool composition are discussed from a measurement and financial perspective.

Author Biography

  • Garron Gianopulos, NWEA

    Garron Gianopulos is a learning and assessment engineer at NWEA. Dr. Gianopulos has a broad interest in the practical application of IRT in the development of formative, interim, and summative assessments, and his latest research interests have focused on the use of explanatory IRT, structural equation modeling, and data visualization techniques to validate theories of learning. Prior to joining NWEA in 2018, he was a psychometrician at North Carolina State University, the North Carolina Department of Public Instruction (NCDPI), Professional Testing Inc., and the University of South Florida. While at NCDPI, Dr. Gianopulos led the development of end-of-year and end-of-course summative assessments in mathematics and interim assessments in multiple subjects. As a psychometrician at North Carolina State, he supported the development of formative diagnostic mathematics assessments centered around learning trajectories. Dr. Gianopulos also served on the North Carolina TAC until his transition to NWEA. Dr. Gianopulos holds a doctorate in curriculum and instruction with an emphasis in educational measurement and evaluation with a cognate in psychometrics from the University of South Florida.

Downloads

Published

2025-03-04