Item Response Theory: Inference about proficiencies (34)

Chair: Christoph Schuster, Wednesday 22nd July, 11.45 - 13.05, Uppercroft, School of Pythagoras. 

Won-Chan Lee, Robert L. Brennan and Eunjung Lee, University of Iowa, USA. Empirical evaluation of IRT ability estimators. (063)

Christof  Schuster and Ke-Hai Yuan, Psychologie und Sportwissenschaft University of Giessen, Robust estimation of latent ability in item response models. (088)

Fumiko Samejima, University of Tennessee, USA. Usefulness of the conditional distribution of the true latent trait, given its MLE. (086)

Chun Wang, Hua-Hua Chang and Keith Boughton, University of Illinois at Urbana-Champaign, USA. Some theoretical results concerning KL information in MIRT. (109)

ABSTRACTS

Empirical evaluation of IRT ability estimators. (063)
Won-Chan Lee, Robert L. Brennan and Eunjung Lee
The application of item response theory (IRT) has been increasing rapidly over the past years in many areas related to educational and psychological testing such as test development, item analysis, scaling, equating, and so on. For some testing programs, IRT pattern scoring is employed to obtain IRT ability estimates (or transformed scores from the estimates) that are reported to examinees. There exist several IRT ability estimators that are routinely discussed in the literature—namely, MLE, EAP, and Warm’s bias-corrected MLE. While the accuracy of these estimators is very important to test takers, there is little in the literature that deals with evaluation of the behavior of the estimators in various testing situations. This study employs real test data sets to examine the relative performance of various IRT ability estimators. The factors that are believed to affect the ability estimators are test characteristics, sample characteristics, and model. Test characteristics include difficulty levels of items in the test, discrimination levels of items, test reliability, and test length. Sample characteristics include sample size and group ability. Model factors mean different IRT models.

Robust estimation of latent ability in item response models. (088)
Christof  Schuster and Ke-Hai Yuan
Because of response disturbances such as guessing, cheating, or carelessness, item response models often can only approximate the ‘true’ individual response probabilities. As a consequence maximum-likelihood estimates of ability will be biased. Typically the nature and extent to which response disturbances are present is unknown and therefore, accounting for them by altering the model is not possible. However, even if the nature of the response disturbances were known, accounting for them by increasing model complexity could easily lead to sample size requirements for estimation purposes that would be difficult to achieve. An approach based on weighting the contributions of the item responses to the log-likelihood function has been suggested by Mislevy & Bock (1982). This estimation approach has been shown to effectively reduce bias of ability estimates in the presence of response disturbances. However, this approach is prone to produce infinite ability estimates for unexpected response patterns in which correct answers are sparse. An alternative robust estimator of ability is suggested. Limited simulation studies show that the two estimators are equivalent when evaluated in term of mean squared error. However, the estimator proposed is much less likely to produce infinite estimates.

Usefulness of the conditional distribution of the true latent trait, given its MLE. (086)
Fumiko Samejima
Samejima (1998) proposed rationale of three different approaches of non-parametric estimations of the operating characteristics (OPCs) of graded response items, that include the item characteristic functions (ICFs)of dichotomous responses. She used one of them, the conditional p. d. f. approach, in her nonparametric on-line item calibration in computerized adaptive testing (CAT), using realistic simulated data, funded by the Law School Admission Council in 1999-2001. The outcomes turned out to be promising, as will be demonstrated in this paper, without using so many hypothetical examinees nor so many items in the item pool. The method includes truncated 2PL model that is useable in lieu of three-parameter logistic model (3PL) that contributes in reducing ability estimation errors, as well as many other devices. Among others, use of the approximated conditional distribution of the true latent trait or ability, given its maximum likelihood estimate (MLE), in the conditional p. d. f. approach appears to be the biggest reason for the success in the accurate estimation. In this paper, observations are focused on this point, showing why and how this is possible, encouraging other researchers to use the approach.  

Some theoretical results concerning KL information in MIRT. (109)
Chun Wang, Hua-Hua Chang and Keith Boughton
This research investigates the theoretical relationship between Fisher Information (FI) and Kullback-Leilber Information (KL) in Multidimensional Item Response Theory (MIRT). Three theorems and their potential applications in Multidimensional Adaptive Testing (MAT) are introduced. The first theorem shows that in MIRT the complete FI matrix can be fully recovered from KL. Each diagonal element of the matrix equals to the curvature of the KL Information curve evaluated at each theta respectively. The second theorem shows that if the KL Information index (KI) developed by Veldkamp & van der Linden (2002) is adopted for item selection, then in two dimension case, the size of KI depends largely upona function of the discrimination parameters.. The third  theorem relates KI to the item difficulty, i.e., KI is maximized when item difficulty equals to the linear combination of the theta elements. The theoretical results imply that KL maintains a global information measure in MIRT, indicating that KL is more informative than FI. Furthermore, the relationship between KL and item parameters will greatly simplify the computation in item selection in MAT, in particular, the computational intensive multiple integrations described in Veldkamp & van der Linden (2008). Furthermore, the theoretical results may help practitioners to develop better item selection methods in MAT, and more efficiently manage constraint control without sacrificing estimation accuracy. Further development for future large-scale application will be discussed.