Measurement issues (40)

Chair: Keith Markus, Tuesday 21st July, 15.25 - 16.45, Boys Smith Room, Fisher Building. 

Keith A. Markus, Psychology Department, John Lay College of Criminal Justice, City University of New York, USA. How can validity come in degrees? (082)

Henk Kelderman, VU University, Amsterdam, The Netherlands. Can questionnaires satisfy measurement invariance? (165)

Gunter Maris, Cito, University of Amsterdam, The Netherlands. How to score an exam? (161)

ABSTRACTS

How can validity come in degrees? (082)
Keith A. Markus
The idea that test validity comes in degrees has figured in validity theory for at least 80 years. Arguments have been pragmatic and intuitive, and the basis for degrees of validity has not been elaborated. Theories of confirmation, including Bayesian confirmation theory, assume degrees of support, which makes it an interesting question how one gets degrees of validity without assuming them. Possible bases for ordering degrees of validity include the number of studies, the number of types of evidence, the strength of evidence, and the number of premises in the validity argument supported. Four main problems emerge: redundant support, strength of support, irrelevancies, and invariance under re-description. Deductive strength addresses the first and fourth problems because redundancy does not increase deductive strength or depend on how premises are organized. To address the second and third problems, deductive strength needs to extend beyond the domain of the validity argument in order to capture the relative centrality of beliefs within a broader network. Stronger support then generates more consequences for belief revision. The open texture of scientific theories makes a precise calculus of degree of validity implausible, but a broad theory can nonetheless help guide validation decisions and resolve disagreements.

Can questionnaires satisfy measurement invariance? (165)
Henk Kelderman
Latent variable models have been developed in the context of cognitive research where test are constructed so that each item elicits an instance of an independent stochastic cognitive process where performance level is represented by a latent variable (Spearman, 1904). In the case of questionnaires, however, there is no such obvious connection between psychological substance and latent variable. In questionnaires, the hypothetical attribute to be measured is related to natural language propositions towards which the person can have different mental postures. These postures are virtually constant at measurement time, but may not have an exclusive relation to the hypothetical attribute. Thus, the assumption of local stochastic independence seems not appropriate in this case. In a seminal research paper on the Big Five personality inventory McRea, Zonderman, Costat and Bond (1996) conclude that maximum likelihood factor analysis is systematically flawed when applied to personality questionnaires. In this paper we view measurement invariance from the perspective of a larger network of scientifically relevant variables and discuss the substantive plausibility of different sets of auxiliary model assumptions from a substantive and experimental perspective. It is concluded that extending the assessment model with certain explanatory variables and dropping local stochastic independence assumptions, greatly improves the fidelity of the assessment of measurement invariance. References: McCrae, R.R., Zonderman, A.B., Costa, P.T., Jr., Bond, M.H., & Paunonen (1996). Evaluating replicability of factors in the Revised NEO Personality Inventory: Confirmatory factor analysis versus Procrustes rotation. Journal of Personality and Social Psychology, 70, 552–566. Spearman C. (1904) "General intelligence" objectively determined and measured. Amer.J. Psychol., 15, 201-293.

How to score an exam? (161)
Gunter Maris
In educational measurement there are two key problems. First, we need to know how to score an exam. Second, we need to consider how scores on different exams can be equated. We deal with the first of these. The purpose of this presentation is to develop a method for the construction of impartial scoring rules. Any scoring rule reduces the information about the ability of a person contained in the responses to individual questions to a single number. A scoring rule is impartial if (ordinal) inferences based on the score do not violate those based on the complete pattern of responses. If a scoring rule is not impartial, different decisions may be reached based on the score and based on the full response pattern. Loosely speaking you say to a person that after having reduced all his responses to a single number (s)he fails the exam; whereas if a decision were based on the individual responses (s)he would pass the exam. The method for constructing impartial scoring rules that will be developed is firmly rooted in both the representational theory of measurement and in Item Response Theory. It provides one of the first, if not the first, instance of an extensive measure in the social sciences, the (im)possibility of which has long been debated.