Item Response Theory: Polytomous items (35)

Chair: Marieke van Onna, Wednesday 22nd July, 15.25 - 16.45, Castlereagh Room, Fisher Building. 

Prathiba Natesan, Florida International University, Hollywood FL, USA. Bayesian estimation of graded response multilevel models. (250) ♥

Tomoya Okubo, The National Center for University Entrance Examinations, Tokyo, Japan, Takahiro Hoshino, School of Economics, Nagoya University, Japan and Shin’ ichi Mayekawa Graduate School of Decision Science and Technology, Tokyo Institute of Technology, Japan . Partially ordered nominal categories model. (189)

Kojiro Shojima, National Center for University Entrance Examinations, Tokyo, Japan. Neural test theory model for graded response data. (166)

Lianghua Shu and Richard Schwarz, CTB/Mc-Graw Hill, Monterey CA, USA. Estimated reliabilities for a test containing multiple item formats. (28)

ABSTRACTS

Bayesian estimation of graded response multilevel models. (250)
Prathiba Natesan
The most recent development in Item Response Theory (IRT) has been the use of multilevel models to evaluate item response models, commonly known as multilevel item response models or generalized linear mixed models. Such a formulation presents one more reason to view IRT models as statistical models and not as an enigma that some statisticians make it out to be, thus providing a meeting point between statistics and psychometrics. Other advantages include the use of general purpose software such as WINBUGS or R for the evaluation of IRT models instead of relying on specialized software such as Multilog. This study presents the formulation of Samejima’s Graded Response Model with the discrimination parameter in the Non-Linear Mixed Model framework (NLMM) (Directed Acyclic Graph given below) and estimates the Graded Response Multilevel Model (GRMM) using Bayesian priors. In so doing, this simulation study will demonstrate the use of appropriate Bayesian priors in estimating the item parameters for various test lengths and sample sizes. In order to demonstrate this model, it will be applied to a real data set from an urban school district with predominantly African American student population that measures the cultural perceptions of urban teachers who are predominantly White.

Partially ordered nominal categories model. (189)
Tomoya Okubo, Takahiro Hoshino and Shin’ ichi Mayekawa
In this research, a Partially Ordered Nominal Categories Model (PONCM) is proposed. PONCM is an IRT model for data containing both ordered and unordered categorical responses. In other words, PONCM is a partially order-constrained NCM. The difference between the Partial Credit Model (PCM), the Generalized Partial Credit Model (GPCM) and the Nominal Categories Model (NCM) can be expressed as the difference in scoring function, which determines the orderliness of the categorical responses. For example, PCM and GPCM have a linear integer scoring function, while NCM is a model whose scoring functions are regarded as unknown parameters which need to be estimated without any order constraint. The proposed model extends the NCM scoring function in such a way as to incorporate partial orders among the response categories. The integrated models are programmed in R, and actual data is analysed in order to demonstrate the usefulness of the model. Some extensions for the model are also introduced.

Neural test theory model for graded response data. (166)
Kojiro Shojima
Psychological questionnaires have insufficient resolution (or reliability) to detect the slight differences between two people with nearly equal psychological characteristics. The most they can do is classify people into several grades. Therefore, it is not reasonable to assume a continuous scale for when representing psychological characteristics. It is satisfactory that psychological questionnaires have only to be used for graded evaluation. Neural test theory (NTT; Shojima, 2009) was developed as a nonparametric test standardization theory in which the assumed latent scale is not continuous but ordinal ("latent rank scale"). The NTT model is a neural network model using the mechanism of a self-organizing map or generative topographic mapping. NTT was originally developed as a statistical model with a latent rank variable for analyzing dichotomous (true/false) test data. In this study, the NTT model for a psychological questionnaire in Likert format is presented. That is, this model is a statistical model with a latent polytomously ordinal variable for analyzing observed polytomously ordinal variables. This model can standardize psychological questionnaires with grading subjects’ psychological characteristics.

Estimated reliabilities for a test containing multiple item formats. (28)
Lianghua Shu and Richard Schwarz
Two common measures of internal consistency are the split-halves and Cronbach’s alpha reliability coefficients. Due to their assumption of essentially tau-equivalent test parts, these measures are generally inappropriate for multi-format tests containing constructed- and selected-response items. For multi-format tests containing congeneric test parts, Qualls (1995) proposed a reliability coefficient that combines Raju’s (1977) and Feldt’s (Feldt & Brennan, 1989) procedures and compared it with Cronbach’s and stratified alpha. These reliability coefficients are computed from observed data which limits their utility. Based on IRT item parameters and a given examinee ability distribution, formulas are derived for model derived estimates of reliability: alpha, stratified-alpha, and Feldt-Raju coefficients, using the three-parameter logistic and two-parameter partial credit models (Muraki, 1992; Yen 1993). Several data examples are provided that compare the observed and model derived reliability estimates. Using these examples, factors affecting the choice of a reliability coefficient are discussed. The model derived estimates can be used in "Spearman-Brown Prophesy" type applications in which an estimate of reliability in the selected test form is needed. In the test form selection, model derived estimates can help insure that a given level of reliability is achieved that incorporates the information contained in the item parameter estimates. Reference: Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational Measurement (3rded., pp. 105-146). Washington, DC: American Council on Education; Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176; Qualls, L. A. (1995). Estimating the Reliability of a Test Containing Multiple Item Formats. Applied Measurement in Education. 8(2), 111-120; Raju, N. S. (1977). A generalization of coefficient alpha. Psychometrika, 42, 549-565; Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.