Computer Adaptive Testing: The influence of early and pretest items (10)

Chair: Mark Reckase, Thursday 23rd July, 14.55 - 16.15, Dirac Room, Fisher Building.

Huijuan Meng, Susan Steinkamp, Pearson VUE, Bloomington MN, USA, Joy Matthews-Lopez, the National Association of Boards of Pharmacy (NABP), USA.  A comparison study of CAT pretest item linking designs. (096)

Mariagiulia Matteucci, Statistics Department, University of Bologna, Italy, Bernard P. Veldkamp, University of Twente, The Netherlands. Including prior information in CAT administration. (200) ♥

Ivailo Partchev, Institute of Psychology, University of Jena, Germany. Some new challenges in research on computerized adaptive testing. (144)

Mark Reckase and Wei He, Michigan State University, USA. The influence of item pool quality on the functioning of Computerized Adaptive Tests (CATs). (014)

ABSTRACTS

A comparison study of CAT pretest item linking designs. (096)
Huijuan Meng, Susan Steinkamp and Joy Matthews-Lopez
The main purpose of this study is to evaluate the performance of three IRT pretest item parameter linking designs: common-item linking with Stocking-and-Lord transformation (CI), fixed-item-parameter (FI), and fixed-person-parameter (FP) for data collected for a CAT exam. Two factors will be included in order to investigate the effect of using different sample size criteria to select operational/pretest items on the calibration results. More specifically, 2 datasets will be created for operational items, which include (1) items with N>=500; and (2) items with N>=1,000. For pretest items, 2 datasets will be compiled, which include (1) items with N>=400; and (2) items with N>=500. In this study, real data from a large-scale national exam will be analyzed and differences across twelve sets of item parameter estimates will be examined. Simulated data will also be generated under the 3PL model and analyzed to compare the performance of these designs on item parameter recovery. Average absolute bias across items will be used to evaluate the accuracy of derived parameter estimates over 20 replications.

Including prior information in CAT administration. (200)
Mariagiulia Matteucci and Bernard P. Veldkamp
In this work, the use of empirical prior information in computerized adaptive testing (CAT) is investigated. Besides the individual responses, background variables concerning the examinees may be available and can be introduced in the estimation process. Whenever a strong relationship between the ability and the covariates is identified, it is possible to include collateral information both in the initialization and in the ability estimation process within CAT. Commonly, item selection in CAT is performed by adopting the maximum-information criterion. A serious consequence of applying this criterion is over exposure of the first item. Moreover, while research has been oriented towards using CAT to shorten the test length, adaptive testing shows a peculiar weakness especially when dealing with short tests. In the current work, a simulation study has been carried out in order to compare the accuracy of the ability estimates in case prior information is included or not. Special attention is given to the case of short tests. The findings show that, when empirical information is introduced, ability estimates are more precise, i.e. mean standard deviations are lower. Furthermore, the introduction of individual prior information reduces the item over-exposure, because the ability is initialized with reference to the examinee's covariates.

Some new challenges in research on computerized adaptive testing. (144)
Ivailo Partchev
Several recent papers have discussed a tendency of CAT to produce, for some highly-able examinees, scores that are suspiciously low. The phenomenon may be statistically confounded with gender and hence mistaken for differential item functioning. The most plausible explanation is that the change in the ability estimate after each new item rapidly decreases in absolute size as the test gets longer, making it difficult to compensate for a sharp initial dip. Further factors, such as the composition of the item pool and the choice of priors in Bayesian estimation of ability, contribute to the process in rather unpredictable ways. Serious as it is, the problem highlights some limitations in the established practice of research on CAT. New developments in CAT are typically evaluated with repeated simulations of tests at a sequence of true ability levels. The current practice of examining the conditional means and variances of the estimates given true ability may be inadequate to diagnose what seems to be a mixture problem, while finite mixture techniques may be too unstable to serve as a practical tool in evaluating and comparing new approaches to CAT. The paper examines some possibilities to diagnose the problem of differential underestimation in CAT, and utilizes these to evaluate a possible countermeasure.

The influence of item pool quality on the functioning of Computerized Adaptive Tests (CATs). (014)
Mark Reckase and Wei He
Computerized adaptive tests (CATs) have been shown to be very efficient methods for locating persons on IRT score scales. But, the expected properties of such tests only result if the item pool for CAT has the test items that are requested by the item selection algorithm. This paper describes a process for designing an item pool that will support the desired properties of a CAT. This process will be demonstrated for CATs based on both the Rasch model and the three-parameter logistic model. The value of the item pool design process will be shown by comparing the quality of estimates of examinee locations from well designed item pools with those obtained from item pools that are reported in the research literature on CAT. The use of the proposed process for CATs that include exposure control and content balancing will also be discussed.