Differential Item Functioning: Multilevel and structural equation approaches (29)

Chair: Rianne Janssen, Thursday July 23rd, 11.30 - 12.50:, Lowercroft, School of Pythagoras.

Yuk-Fai Cheong, Division of Educational Studies, Emory University, Atlanta GA, USA. Detection of Differential Item Functioning in problem behavior items via hierarchical cross-classified model. (238)

Suzanne Jak, F. J. Oort, Department of Education, University of Amsterdam, C.V.Dolan Psychological Methods, University of Amsterdam, C.W.M. van Zoelen, Research & Development, Meurs HRM, Woerden, The Netherlands.Using SEM to detect measurement bias in dichotomous item responses: An application to the measurement of intelligence in higher education. (034) ♥

Ching-Lin Shih, National Taichng University, Taiwan, Wen-Chung Wang,The Hong Kong Institute of Education, Hong Kong. Selecting DIF-free items to serve as anchors for assessment of differential item functioning: The MIMIC method. (129)

Yen-Fang Chen, Ching-Lin Shih, National Chung Cheng University, Taiwan, Wen-Chung Wang, The Hong Kong Institute of Education, Hong Kong. A scale purification procedure to MACS models for assessment of non-uniform DIF. (174)

ABSTRACTS

Detection of Differential Item Functioning in problem behavior items via hierarchical cross-classified model. (238)
Yuk-Fai Cheong
This study explores and illustrates how differential item functioning (DIF) analysis for problem behavior items can be performed via hierarchical cross-classified models (Raudenbush, 1993; Van den Noortgate, De Boeck, & Meulders, 2003). The approach is based on item response theory (IRT) models in which both the person and item are treated as random (De Boeck, 2008). Gender DIF in item responses from the primary caregiver ratings of 2,177 children aged 9-15 on externalizing behavior problems using the Child Behavior Checklist 4-18 (Achenbach, 1991) was investigated. Evidence of gender DIF was detected in a set of items in the Aggression and Delinquency scales. These results were compared and contrasted with those obtained from the models in which only person effects are assumed random. Implications of this approach for researchers and psychometricians are discussed.
References: Achenbach, T. M. (1991). Manual for the Child Behavior Checklist/4-18 and 1991 profile. Burlington: University of Vermont, Department of Psychiatry. De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533-539. Raudenbush, S. W. (1993). A crossed random effects model for unbalanced data with applications in cross-sectional and longitudinal research. Journal of Educational Statistics, 18, 321-349. Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369-386  

Using SEM to detect measurement bias in dichotomous item responses: An application to the measurement of intelligence in higher education. (034)
Suzanne Jak, F. J. Oort, C.V.Dolan and C.W.M. van Zoelen
The Dutch “Q1000 Capaciteiten Hoog” ability test is developed as a general ability test for people with higher education. The objective of this study was to demonstrate how SEM can be used to detect measurement bias in dichotomous item responses, and to evaluate and compare the multi-group (MG) and restricted factor analysis (RFA) detection methods. Bias with respect to age and sex was investigated in one subscale of the test. In the MG method, bias was detected by using modification indices to test group equality constraints on thresholds and factor loadings. In the RFA method sex and age were investigated simultaneously, by using modification indices to test direct effects of sex and age on the test items. Although not especially suited for the analysis of dichotomous data, the MG SEM seemed to work well when applied to tetrachoric correlations and thresholds. Advantages and disadvantages of the MG and RFA methods are discussed. In order to gain true insight in the effectiveness of the MG and the RFA methods in detecting measurement bias in dichotomous item responses, further research with simulated data should be conducted.  

Selecting DIF-free items to serve as anchors for assessment of differential item functioning: The MIMIC method. (129)
Ching-Lin Shih and Wen-Chung Wang
The MIMIC (multiple indicators, multiple causes) method has been used to assess differential item functioning (DIF) for decades. DIF assessment requires a set of clean (DIF-free) items to sever as a matching variable (anchor) so that the other items in the test can be assessed for DIF. In this study, we employ three variants of the MIMIC method, namely the standard MIMIC method (M-ST), the MIMIC method with a scale purification procedure (M-SP), and the MIMIC method with a pure anchor (M-PA), to select up to ten DIF-free items to serve as anchors. A simulation study was conducted to compare these three methods. Five independent variables were manipulated: (a) item response model; (b) DIF pattern; (c) percentage of DIF items in the test; (d) sample size; and (e) number of anchor items to be selected. The results show that under the “constant” DIF pattern (i.e., all DIF items favored the same group), M-SP was more superior to M-ST when the percentage of DIF items was high. However, when the percentage was as high as 40%, M-PA became the best among the three. The difference in accuracy between methods under small sample sizes was more significant than it was under large sample sizes. Under the balanced DIF pattern (where a half DIF items favored one group and the other half favored the other group), all the three methods performed very similarly.

A scale purification procedure to MACS models for assessment of non-uniform DIF. (174)
Yen-Fang Chen, Ching-Lin Shih and Wen-Chung Wang
The mean and covariance structure (MACS) method has been used to assess differential item functioning (DIF) since the last decade. In this study, we implemented a scale purification procedure to the standard MACS method (denoted as MACS-ST) and called it the MACS method with scale purification (MACS-SP). Through simulations, we compared the performance of MACS-ST and MACS-SP on assessing non-uniform DIF in dichotomous items. Four independent variables were manipulated: (a) DIF detection methods; (b) percentage of DIF items in the test; (c) sample size; and (d) ability distribution. The results showed that MACS-SP outperformed MACS-ST in controlling false positive rates and yielding high true positive rates. As the percentage of DIF items was increased, MACS-ST performed worse whereas MACS-SP was less affected. Both methods performed better as the sample size was increased. Both methods had a higher true positive rate when there was a difference in mean ability between groups.