Latent Variable Models for Complex Data Structures
Symposium organised by Frank Rijmen, Educational Testing Service, Princeton, USA) and Edward Ip,
Chair: Frank Rijmen, Wednesday 22nd July, 9.55 - 11.15, Palmeston Lecture Theatre, Fisher Building.
Frank Rijmen, Educational Testing Service, Princeton, USA) and Edward Ip,
Roberta Varriale, Olga Lukočienė, and Jeroen Vermunt, Department of Methodology and Statistics,
Frank Rijmen, Educational Testing Service,
Edward Ip,
ABSTRACTS
Symposium overview
Frank Rijmen and Edward Ip
Latent variable models are widely used in the social sciences. The latent structure can offer a parsimonious explanation for dependencies among observed variables, and is often of central interest to the researcher. Dependencies among observed variables may be traced back to different levels of a data structure. For example, in the large scale Progress in International Reading Literacy Study, items are clustered within blocks, which in turn are clustered within literary versus informational reading ability. Crossed with this organization in blocks, items are classified with respect to the process they are measuring, such as retrieving explicitly stated information, making inferences, and so on. In order to faithfully capture the most important sources of dependencies among the observed variables, the different levels and modi of a complex data structure should be represented in the latent structure of the model. Complex latent structures call for the development of new estimation methods and model selection procedures. For example, when there is more than one level of latent classes, how to determine the number of latent classes at each level? The symposium will present innovative latent variable models for complex data structure, with an emphasis on model estimation and selection.
Determining the Number of Components in Multilevel Mixture (Factor) Models
Roberta Varriale, Olga Lukočienė and Jeroen K. Vermunt
Recently, in social science literature, various types of mixture models have been developed for datasets having hierarchical or multilevel structure. In two-level datasets, models may include finite mixture distributions at the lower and/or the higher level of the analysis, where lower level units are usually represented by individuals and higher level units by groups. While deciding about the number of mixture components is a delicate and complicated task in one-level mixture models, it is even more complex for multilevel mixture models. In our project, we investigate the performance of various model selection methods based on some information criteria (IC), such as Bayesian IC, Aikake’s IC, Consistent AIC and AIC3. One difficulty occurring in the use of BIC and CAIC in the context of multilevel models is in choosing the appropriate sample size included in their formula: the number of groups, the number of individuals, or a combination of the two. In particular, different sample sizes can be used depending on whether the aim of the research is to determine the number of components at the higher or lower level of the analysis.
A Hierarchical Factor IRT Model for Items that are Clustered at Multiple Levels
Frank Rijmen
It is not uncommon for the items of an assessment to be clustered at multiple levels. For example, the large scale Progress in International Reading Literacy Study consists of item blocks. Each item block consists of a reading passage followed by a set of questions. In addition, blocks of items are clustered within literary versus informational reading ability. The conditional dependencies of the items within an item block, and of item blocks within genre (literary versus informational) can be taken into account by incorporating block and genre specific dimensions in addition to a general dimension representing overall reading ability. Such a model is called a hierarchical factor model in the factor analysis community, and reduces to the bi-factor model if there is only one level of clustering. Full maximum likelihood estimation generally becomes computationally very intensive for multidimensional IRT models. However, by exploiting the conditional independence relations between the dimensions that are implied by the hierarchical structure, efficient full maximum likelihood estimation methods can be obtained. Several hierarchical factor IRT models are presented and applied to data stemming from the Progress in International Reading Literacy Study.
Analyzing Belief Items in Common Sense Models
Edward H. Ip
Measurement of beliefs such as in health, social justice, and political system is an important piece of the puzzle in the understanding human behavior and social outcomes. Based on the so called Common Sense Model of illness (Cameron & Leventhal, 2003), the Common Sense Model Inventory has been recently developed as an instrument for measuring beliefs about various aspects of the diabetes among patients with this disorder. Several challenging psychometric issues exist in analyzing CSMI. First, unlike items in many educational and psychological testing, CSMI items do not have a “correct” answer. The common sense model asserts that individuals' beliefs about a disorder are based upon somatic symptoms and cumulative life course experiences. As a result, patients’ beliefs about the disorder may differ significantly from the “correct” biomedical model of illness. The primary interest here is the typology of beliefs and mental models, not only how deviant they are from the “correct” biomedical model. Second, there is often a high level of heterogeneity in the beliefs about illness; for analytic purposes, unidimensional models do not suffice. Third, the number of belief items in CSMI is quite large, while the number of patients is quite small (compared to educational testing). The latter is often a result of budget and other constraints. To circumvent the three technical challenges, we propose a two-stage latent class model for analyzing belief inventory data. A sample of N=92 diabetic patients will be used to illustrate the methodology. (92)