Differential Item Functioning: Detection (26)
Chair: Peter van Riijn, Thursday 23rd Jul, 9.40 - 11.00, Boys Smith Room, Fisher Building.
David Magis and Paul de Boeck, Research Group of Quantitative Psychology and Individual Differences, K. U. Leuven, Belgium. A robust statistical approach to differential item functioning. (032) ♥
Lihua Yao, Defense Manpower Data Center, Seaside, CA, USA, and Feiming Li, National Board of Osteopathic Medical Examiners, Chicago Il, USA. A DIF detection procedure in multidimensional framework and its applications. (039)
Michelle Langer, National Board of Medical Examiners, Philadelphia PA, USA. A re-examination of Lord’s Wald test for differential item functioning using item response theory and modern error estimation. (062)
Paul de Boeck, Department of Psychology, K.U. Leuven, Belgium, and Sun-Joo Cho, University of California at Berkeley, USA. Look, can you see where there is DIF? (141)
ABSTRACTS
A robust statistical approach to differential item functioning. (032)
David Magis and Paul de Boeck
The key notion of this talk is to consider differentially functioning (DIF) items as outliers, which can be identified based on robust statistical tools for inference. The method does not require anchor items, and can be performed without a stepwise identification and rejection of the items. As far as uniform DIF is concerned, the distance between the difficulties in the focal group and reference group, can be used to identify outliers. Several distances are discussed and compared through a simulation study. It is shown that the Raju (1990) distance, in combination with robust estimates of location and dispersion, performs quite well in comparison with traditional DIF detection methods.
A DIF detection procedure in multidimensional framework and its applications. (039)
Lihua Yao and Feiming Li
DIF detection procedures are available not only in the classical test theory framework, but also in IRT-based framework, such as Lord's chi-square method (Lord, 1980), Raju's area measures (Raju, 1988, 1990), and the likelihood ratio test (Thissen, Steinberg, & Wainer, 1988). These methods are very useful in detecting DIF, however, little progress has been made in understanding the causes of DIF. Benign DIF caused by auxiliary dimensions enhances construct validity of a test, while adverse DIF resulted from nuisance dimensions lowers construct validity. Benign DIF items can not be detected by conducting an additional DIF analysis in which all construct-relevant dimensions are modeled and included in the conditioning variable. Adverse DIF, however, can be eliminated only by deleting the item or by revising it. Therefore, it is very important to have a procedure to investigate the cause of DIF and detect only adverse DIF. In this study, we developed a DIF detection procedure in multidimensional framework to flag only those items that have adverse DIF. Items of benign DIF detected by other procedures will not be flagged. The DIF detection procedure proposed in this study was applied to both real data and simulated data and was found to be successful.
A re-examination of Lord’s Wald test for differential item functioning using item response theory and modern error estimation. (062)
Michelle Langer
The item response theory (IRT) model comparison approach has been shown to be the most flexible and powerful method for differential item functioning (DIF) detection; however, it is computationally-intensive, requiring many model-refittings. The Wald test, originally employed by Lord for DIF detection, is asymptotically equivalent to this approach and requires only one model fitting. In this research, the Wald test for DIF detection was improved from Lord’s original conception through modern error estimation, concurrent calibration, maximum marginal likelihood item parameter estimation, conditional DIF tests, and extensions to commonly used IRT models. This research examined the Type I error and power of the Wald test by varying the magnitude of DIF, the mean difference between groups, test length, and sample size. Data were simulated under the graded response model and the three-parameter logistic (3PL) model. An additional simulation study compared the IRT model comparison approach to the Wald test under the two-parameter logistic model. The results indicated that the Wald test performs well detecting DIF. The performance improves with larger sample sizes, greater magnitudes of DIF, greater test lengths, and the random assignment estimation procedure. The Wald test also performs well compared to the IRT model comparison approach.
Look, can you see where there is DIF? (141)
Paul de Boeck and Sun-Joo Cho
A number of scatter plots will be shown with estimates of item difficulties and item discriminations in the focal group versus the reference group, while also indicating the confidence intervals of the estimates. The persons in the audience will be invited to indicate for each scatter plot whether or not they see DIF and where. The presentation is meant to focus on some conceptual features of DIF and to suggest a DIF typology, starting from visual displays of item estimates in both the focal and the reference group.