Models and designs for tests with explanatory rules for their item difficulties.
Symposium organised by Wim van der Linden, CTB/McGraw-Hill, Monteray,
Chair: Wim van der Linden, Thursday 23rd July, 9.40 - 11.00, Palmeston Lecture Theatre, Fisher Building.

Johannes Hartig, Department of Educational Research Methodology,
Cees Glas, Department of Research Methodology, Measurement, and Data Analysis,
Hanneke Geerlings, Department of Research Methodology, Measurement and Data Analysis
Andreas Frey and Nicki-Nils Seitz, Leibniz Institute for Science Education (IPN), Kiel, Germany. Classification of individuals using multidimensional adaptive testing with feedforward.
ABSTRACTS
Introduction: Models and designs for tests with explanatory rules for their item difficulties
Wim J van der Linden
A recent development in educational and psychological testing is to specify explanatory rules for the item difficulties and base the design, administration, and scoring of the tests directly on empirical estimates of the effects of these rules rather than calibration of the individual items. The first paper presents results from a study in which the effects of the explanatory rules were modeled as fixed effects in a two-dimensional linear logistic model. The model was successfullly applied to tests of reading and listening comprehension in English as a second language. The second paper presents a hierarchical model for item cloning with families of items based on the same explanatory rules but with additional variation in their surface characteristics. It is shown how the model can be estimated and validated using marginal maximum likelihood estimation and Lagrange multiplier tests. The last two papers address the design of rule-based tests. The first of these two papers introduces a linear structure in the hierarchical model for item cloning to explain the differences between item families and shows how optimal design principles can be used to automatically generate a fixed test from a pool of item families. The last paper is for the case of a multidimensional ability structure and is particularly interested in how an adaptive test design can be used to optimize classification decisions about test takers.
Explanatory Models for Item Difficulties in
Johannes Hartig, Claudia Harsch and Jane Höler
MML Estimation and Lagrange Multiplier Tests for Item-Cloning Models
Cees A.W. Glas
Optimal Design of Tests with Rule-Based Item Generation
Hanneke Geerlings and
The possibilities of optimal test design with automatic item generation were examined. Two different methods of item generation were addressed. The first method assumed rule-based generation of the items. Statistically, the rules were assumed to have a fixed effect on the difficulty of the items. A well-known response model for such items is the linear logistic test model. The second method was item cloning, which leads to families of items that differ only in surface features. Statistically, the items within these families are assumed to vary only randomly in difficulty. A hierarchical response model was developed that accounts for the fact that items are grouped in families created through the joint application of the two types of item generation. The main goal of the presentation is to demonstrate the use of the model in optimal test assembly. Particularly, the effect of random instead of fixed item parameters on the optimization model and its solution were investigated. Keywords: Optimal design; optimal test assembly; item response theory; automated item generation.
Classification of Individuals using Multidimensional Adaptive Testing with Feedforward
Andreas Frey, Nicki-Nils Seitz
Multidimensional adaptive testing (MAT) can be used to classify individuals into two or more categories (like pass/fail) on multiple dimensions. One possible stopping criterion is to present items until the probability of an incorrect classification falls below a predefined level (e.g. 5%). However, for ability estimates near a cut-off-point, the maximum number of allowed items may be presented without reaching the wanted classification certainty. In order to avoid such unnecessary long tests, a feedforward strategy can be applied, which checks whether the target level of classification certainty would be reached if the following answers were all correct or all incorrect and stops the tests prematurely if this were not be the case. In a simulation study, measurement efficiency of MAT with and without feedforward was compared both to one-dimensional adaptive testing (CAT) and sequential testing with a fixed item set (FIT). The efficiency was calculated as the ratio of the percentage of correct classifications and the number of items presented. The lowest efficiency was obtained for FIT (2.29). For CAT and especially for MAT, the efficiency was substantially higher. Use of MAT with the feedforward strategy led to an additional gain in efficiency. The practical consequences of incorporating MAT with a feedforward strategy relative to those for CAT and FIT will be discussed. Keywords: Multidimensional adaptive testing; item response theory; computerized adaptive testing; adaptive classification testing. (50)