Poster Session One: Foyer, Tuesday July 21st, 18.00 - 19.00
Goran Lazendic,
Chuan-Ju Lin, National University of Taiwan. Controlling test overlap rate in the automated assembly of alternate tests. (107)
Eunjung Lee, University of Iowa, USA, Won-Chan Lee. Evaluation of equating results based on the first and second order equity properties. (116)
Norikazu Iwama, Graduate School of Humanities and Social Sciences, Waseda University, Saitama, Japan, Hideki Toyoda. Structural Equation Modeling with selective ADF3 for reducing estimation time. (149)
Kosuke Fukunaka, School of Humanities and Social Sciences, Waseda University, Tokyo, Japan, Hideki Toyoda. Graphical modeling for factors by using rotation. (150)
Marie-Anne Mittelhaëuser, Tilburg
Kentaro Nakamura, Saitama Gakuen University, Japan. Scale development considered reliability and validity simultaneously using a genetic algorithm. (169)
Koken Ozaki, Japan Society for the Promotion of Science,
Yusuke Miyamoto, Osaka University, Japan. A factor rotation criterion log-linear model. (187)
Margot Bennink, Tilburg University, The Netherlands, M.A. Croon. Comparing methods for constructing confidence intervals for indirect effects in a multilevel setting with the outcome variable at team level. (217)
Carmen Ximenez, Departamento de Psicologia Social y Metodologia, Universidad Autonoma de Madrid, Spain. Recovery of weak factor loadings in Confirmatory Factor Analysis under conditions of model misspecification. (027)
Mio Tsubakimoto, Interfaculty Initiative in Information Studies, University of Tokyo, Japan, Masayoshi Yanagisawa and Kanji Akahori. Examination of Aspects of Term Paper-Grading Assisted by Visualization. (143)
Takuya Ohmori, Ta
Bian Yufang, National Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, China, Wang Yehui. Development of standardized Chinese Achievement Test for Junior High School students. (094)
Hibist Astatke, Population Services International, Washington, DC, USA, Mesfin Mulatu, Peter Buyungo, Daniel Crapper and Tsega Berhanu. Testing the applicability of a theoretical model to predict ITN use in Ethiopia. (142)
Michalis P. Michaelides,
Bo Zhang, University of Wisconsin, Milwaukee, USA, Jehanzeb Cheema. Modeling of achievement gaps using international test data (110)
Alexandra Bacopoulos-Viau, University of Cambridge, UK, John Rust. The creation of the first psychology laboratory in Britain: Cambridge, 1886-1888. (261)
ABSTRACTS
Modelling growth of students achievement in international competitions and assessments for schools in mathematics. (093)
Goran Lazendic
Educational Assessment Australia at the University of New South Wales conducts the International Competitions and Assessments for Schools (ICAS) in Mathematics in a large number of schools in Australia and internationally. The longitudinal data collected in ICAS-Mathematics are hierarchically structured by jurisdictions and schools. In educational measurement such data are typically analysed using hierarchical linear modelling. According to such an approach, a single parameter is sufficient to approximate growth parameters for all members of a cohort. However, this is not an optimal solution when assessment is aimed to provide information about the individual students. In contrast, the growth mixture modelling (GMM) and latent class growth analysis (LCGA) provide an opportunity to investigate intra-individual growth while accounting for inter-individual commonalities. To this end GMM and LCGA have been used to model growth in achievement in ICAS-Mathematics for students from Year 3 to Year 6. Gender and the first language were modelled as student-level covariates while jurisdiction and school type were school-level covariates. Monte Carlo studies were conducted in order to determine an appropriate sample size. Analysis indicates that GMM with a single latent class of students provides the best fit to the data.
Controlling test overlap rate in the automated assembly of alternate tests. (107)
Chuan-Ju Lin
In the context of assembling equivalent test forms, the test overlap rate would be extremely high because the items selected to fulfill the constraints (e.g., target test information function) would be almost the same across multiple test forms without item-exposure control. If enough items appear frequently on many forms, test security and validity will be in question. Therefore, test overlap control should be an important concern, and may be more crucial for automated test assembly (i.e., ATA) than for computerized adaptive testing (i.e., CAT). The purpose of this study is to compare two exposure control procedures that will minimize test overlap rate between pairs of test forms while producing tests that still meet content and psychometric constraints. This paper applies the random process of Lin and Spray (2001) and the ordered-item-pooling control of Chen and Lei (2009), outside the ATA process, to control item exposure and average test overlap rate across multiple test forms. That is, these two methods were used to control the administration rate of an item after the item was selected based on the test-assembly algorism. A simulation study was conducted to examine the extent of item usage control under these two exposure control procedures.
Evaluation of equating results based on the first and second order equity properties. (116)
Eunjung Lee
As a wide variety of equating methods have been developed, it has been increasingly an important issue to evaluate the results for different equating methods. The primary purpose of the current study is to provide empirical evidence to extend our understanding of the adequacy of various equating methods based on the so-called first- and second-order equity properties. The present study examines the performance of a variety of equating methods including the item response theory (IRT) true score method, the IRT observed score method, the presmoothed equipercentile method, the postsmoothed equipercentile method, and the modified frequency estimation equating method. Both the three-parameter logistic (3PL) and Rasch models are used for the IRT equating methods. In addition, this study evaluates the performance of averaging results for different equating methods (e.g., averaging the IRT true and observed score equating results). Real data sets are analyzed and a random groups design is employed for equating. Each of the 3PL and Rasch models is used to compute the conditional expected scores and conditional standard errors of measurement.
Structural Equation Modeling with selective ADF3 for reducing estimation time. (149)
Norikazu Iwama
In structural equation modeling using higher-order moments, it is difficult to estimate a model with many observed variables because the size of a matrix and vectors used for constructing the objective function may be too large and it takes too much time to compute. In this study, we proposed an estimation method named the selective ADF3 method for resolving the problem. In this method, we select some of all the third-order moments used for estimation on following proposed rules and attempt to make the size of the abovementioned matrix and vectors small to shorten the estimation time.We confirmed the effectiveness of the method through a simulation study and its application for actual data. In the simulation study, it is shown that we can estimate a given model with many variables much more quickly by the proposed method than by an existing estimation method. In addition, we can attain estimation results that are interpreted appropriately in the application study.
Graphical modeling for factors by using rotation. (150)
Kosuke Fukunaka and Hideki Toyoda
In this presentation, it is shown that the Graphical Modeling for factors is able to be run by using rotation in exploratory factor analysis. A partial correlation matrix was calculated with factor correlation matrix in previous method, and the estimation of parameter was repeated with fixing its minimum value to zero. However, this ethod is not considered that the property of factor gets distorted by fixing some factor correlations to zero. We improve this problem in this study, and propose the new method for exploratory model analysis that is able to run both the search of CFA measurement part and the search of factor correlations at the same time.
Using person-fit analysis for developing revised scales: Application to Type-D personality assessment. (168)
Marie-Anne Mittelhaëuser and Wilco Emons
When drawing conclusions about an individual on the basis of test scores, it is of major importance that the observed scores validly represent the underlying trait. Unexpected response behavior resulting, for example, from carelessly responding to the wording of items or the tendency to choose extreme response options may distort the validity of the scores. It is important to identify respondents whose scores may be an invalid indicator of the underlying construct. In this study, we used person-fit analysis to detect aberrant item-score vectors across two tests measuring the same underlying construct to evaluate the comparability of the two measurements at the individual level. We used the Lz-statistic for polytomous items and a sum-score based approach. The Lz-statistic was used to detect global misfit, whereas the sum-score based approach was used to detect misfit that manifests itself locally in an individual item-score vector. This local method may help to identify possible explanations of person misfit. Because of the bias in the theoretical Lz-statistic, the p-values of the Lz-statistic were calculated by means of parametric bootstrapping. Results from empirical data analysis complemented with simulation studies will be presented.
Scale development considered reliability and validity simultaneously using a genetic algorithm. (169)
Kentaro Nakamura
In behavioral sciences, researchers frequently engage in development of scales, tests or questionnaires to measure traits of interest that are not directly observable. In the process of construction and development of scales, they are repeatedly revised to accomplish high reliability and validity. However, it is difficult to select items manually to ensure both to be high, because there are too many possible combinations of items. In this study, a genetic algorithm is applied to select items of which components maintain high reliability and high validity simultaneously. The performance of the method is investigated through a simulation study.
Non-normal Structural Equation Modeling on multilevel data. (170)
Koken Ozaki, Kentaro Nakamura and Hiroto Murohashi
When a data is obtained by a stratified sampling, multilevel modeling has to be used to analyze the data. In this study, a new multilevel modeling will be developed within the framework of non-normal structural equation modeling. The developed method enables us to detect direction of causations for each level. For example, study hours probably have an effect on his grade in individual level, however the mean of the grade within schools may has an effect on the mean of the study hours in a school level. The developed method can address such hypotheses. A simulation study will be provided to examine the characteristics of the method.
A factor rotation criterion log-linear model. (187)
Yusuke Miyamoto
Orthomax-family factor rotations, such as varimax or quatimax, tend to fail if the "best" (in a sense of interpretability) loading matrix has imperfectly simple structure, for example, more than one substantial loading per row. This is because they are too strict about (Thurstone's) simple structure. Yet, nother criterion such as geomin rotation tends to overfit such a structure and not to reach row-wise simplicity. In this study a criterion for oblique factor rotation is proposed. Knüsel (2008) proposed Chisquaremax, in which chi-square statistics are adopted as a rotation criterion. We make an atempt to extend this idea to log-linear model formulation, generalising the existing rotation criterions.
Comparing methods for constructing confidence intervals for indirect effects in a multilevel setting with the outcome variable at team level. (217)
Margot Bennick and M.A. Croon
Multilevel analysis is used to investigate relationships between variables that are measured at different hierarchical levels, such as variables measured at team level and variables measured at the level of employees within teams. Until recently only methods were available for situations in which the dependent variables were measured at the lower individual level. Recently, Croon and van Veldhoven (2007) proposed a latent variable approach for modeling the effect of individual level variables on dependent variables measured at the higher team level. This model (and extensions thereof) can be recast in the form of a two-level path model so that its parameters can be estimated by Mplus. In this poster several methods for constructing confidence intervals (CI) for mediational effects in the two-level path model will be discussed and compared. Some of these CI construction methods are based on the assumption that the mediating effects are normally distributed while other methods are based on appropriate parametric and non-parametric bootstrapping procedures for multilevel designs. The comparison of the selected CI methods will be based on simulated and real data.
Recovery of weak factor loadings in Confirmatory Factor Analysis under conditions of model misspecification. (027)
Carmen Ximenez
This work presents the results of two Monte Carlo simulation studies of recovery of weak factor loadings in the context of confirmatory factor analysis for models that do not exactly hold in the population. This issue has not been examined in previous research. Model error was introduced using the procedure developed by Cudeck and Browne (1992) that allows the user to specify a covariance structure with a specified discrepancy in the population. This method was chosen because it has the advantage that there is no need to designate the specific nature of the model error (i.e., it potentially includes all types of possible errors).The effects of sample size, estimation method (ML vs. ULS), and factor correlation were also considered. The first simulation study examined recovery for models correctly specified with the known number of factors, and the second for models incorrectly specified by an underfactoring condition. The results showed that recovery was unaffected by model discrepancy for the correctly specified models, but was affected for the incorrectly specified models. The results also suggested that in both studies recovery improved when factors were correlated and that ULS performed better than ML in recovering the weak factor loadings.
Examination of Aspects of Term Paper-Grading Assisted by Visualization. (143)
Mio Tsubakimoto, Masayoshi Yanagisawa and Kanji Akahori
This paper considers the applicability of a term paper–grading assistance map based on techniques developed by Deerwester et al (1990) by examining the results of a grading simulation that used specific text evaluation criteria. For this simulation, visualization maps were created for two sets of papers that had different themes and text structures by utilizing the results of pre-scoring by human raters. The results showed that the viewability of visualized information was superior for reports written from a set viewpoint, for example, pro or con, than for open-ended reports for which the contents and form were not specified. Moreover, it was demonstrated that the text evaluation criteria examining the comprehensibility of characteristic words associated with the particular contents of the papers showed good validity in conjunction with the visualization method. Hence, the kind of text structures and evaluation criteria for which grading assistance utilizing information visualization techniques are particularly applicable were clarified.
Bayesian diagnostic model in testlet adaptive testing. (218)
Takuya Ohmori and Kazuo Shigemasu
This study shows a new adaptive testing method for diagnosing students’misconceptions or errors (which are called bugs) with Bayesian network model. Item Response Theory (IRT) model is often used for usual adaptive testing, but when tests are made up of testlets, standard IRT models are often not appropriate due to the assumption of local dependence between items or testlets. Some testlet-based IRT model has recently been developed under such conditions (ex. Wainer et al., 2000), but there are not so many. In this study, avoiding the local independence problem, Bayesian network model is introduced to represent the structures within testlet directly and intuitively, and also relations between each testlet. Furthermore, incorporating bug model to the adaptive testing, not fully mastered students can realize their abilities. Expected Value of Sample Information (EVSI) criterion is used to select testlets of items for adaptive testing, which are based on decision making thought. Simulation study shows the validity of this method, and also real data (arithmetic) is applied to this method.
Development of standardized Chinese Achievement Test for Junior High School students. (094)
Bian Yufang and Wang Yehui
In order to provide accountability information for public and set a national norm of Chinese achievement for junior high school students, the study aimed to develop standardized Chinese achievement test batteries. Based on the analysis of several different versions of textbooks, researchers drafted a general test specification in which contents were selected from intersection set of the textbooks. The test specification was defined by content domains and cognitive processes jointly. The content domain includes language literacy (comprising language knowledge and culture knowledge) and reading (comprising information context and literacy context). The cognitive process includes knowing and remembering, understanding and interpreting, reflecting and evaluating in language literacy and acquiring information, developing interpretation and evaluating of the content and form of a text in reading. Based on the specification, the item bank consisting of 1050 items was constructed by mastery teachers. Considering of differences among regions, urban and rural areas, ethnic groups and so on, 114 items were selected all of which are multiple-choice scored incorrect (0) or correct (1). The matrix sampling method was used to assembling the test batteries in order to decrease the test time and avoid the fatigue effect. After twice pilot tests, all the CTT and IRT indexes were acceptable.
Testing the applicability of a theoretical model to predict ITN use in Ethiopia. (142)
Hibist Astatke, Mesfin Mulatu, Peter Buyungo, Daniel Crapper and Tsega Berhanu
About a third of Ethiopia’s population is at risk for malaria infection and young children represent the most vulnerable segment of the population. Use of insecticide-treated mosquito nets (ITN) is the primary mode of malaria prevention. Identifying theory-based psychosocial determinants of intention to use ITNs is important for designing effective malaria prevention programs. A survey was carried out with primary caregivers of children under five in systematically selected households (n = 1,206) in urban/semi-urban and rural areas in southern Ethiopia. Structural equation modeling (SEM) was applied to test the usefulness of Fishbein’s Integrative Model of health behavior linking psychosocial factors, social marketing communication, sociodemographic characteristics to intention to use ITNs. SEM results provided partial support for the model tested. As expected, positive attitudes and perceived social norms predicted intention to use ITNs. In contrast to expectation, perceived internal control and perceived availability were not significantly associated with intent to use. Overall, this theory-based model accounted for large proportion of variance in intention (43%). It is concluded that integrative health behavior models can be useful to explain ITN use and perhaps to developing interventions on ITN use in Ethiopia.
Michalis P. Michaelides
Responding to policies that emphasize national standards and standardized testing in education, recent research in the area of educational assessment has turned to new directions such as the investigation of how teachers make use of formative assessment practices, e.g. Black & Wiliam (1998), Black et al. (2004), and the examination of how teachers conceive assessment and its purposes, e.g. Brown (2004, 2009). The objective of this research is to collect and analyze data from a national sample of Cypriot teachers of various specializations and ranks using Brown’s (2002) Conceptions of Assessment III Abridged Survey scale. The scale has been adapted to Greek and administered to about 250 teachers. Analysis of the data will focus on the psychometrics of the scale, the structure of the conceptions (assessment for improving learning, for holding students accountable, for holding schools accountable, and as a practice irrelevant to education), differences between demographic groups, and the comparison of the results with findings from other countries. Results will be correlated with data from a survey on teachers’ assessment practices to clarify the link between conceptions and self-reported practices in educational assessment.
Modeling of achievement gaps using international test data (110)
Bo Zhang and Jehanzeb Cheema
The primary goal of this research is to enhance the understanding of achievement gaps by exploring new factors that are related to the prevalent achievement gaps among students with different racial, social, and economic backgrounds. To achieve this goal, data from the Program of International Student Assessment (PISA) will be studied. Along with the assessment of domain-specific cognitive areas such as science, mathematics and reading, PISA also conducts extended questionnaire on students, schools and parents, making it possible to identify new factors that may be related to the causation of achievement gaps. A series of hierarchical model analysis will be conducted to study the impact of factors such as student experience, school expectation, student-school interaction, motivation, parental involvement, and computer access and literacy, on achievement gaps in mathematics. Of particular interest is how the characteristics of students interact with the features of schools with regard to the achievement differences. Methodologically, this research will explore the impact of missing data and its treatment methods on HLM modeling.
The creation of the first psychology laboratory in Britain: Cambridge, 1886-1888. (261)
Alexandra Bacopoulos-Viau and John Rust
An interesting episode in the historical evolution of psychometrics took place in Cambridge between 1886 and 1889. James McKeen Cattell, later to become an eminent psychologist in America, resided in Cambridge between these dates, having been made a “Fellow Commoner” of St John’s College in October 1886. That year, Cattell had completed his PhD (entitled “Psychometric Investigations”) under Wilhelm Wundt at Wundt’s Leipzig Laboratory in Germany and, on arriving at Cambridge, made plans to travel back to claim his equipment and buy new apparatus. During his time in Leipzig, Cattell had entered into enthusiastic correspondence with Francis Galton in London, who had been making enquiries of Wundt about the possibility of using some of his psychophysics equipment as part of his anthropological investigations. While Wundt concentrated on standard psychophysical investigations, his focus was not on individual differences, but Cattell’s was, and he effectively became the go-between for the two evolving streams of thought. By summer 1887 Cattell had a laboratory, a room in the Cavendish physics building in Cambridge, where he supplemented his Leipzig equipment with other apparatus, designed by himself and built by Horace Darwin (Charles Darwin’s son) at the Cambridge Scientific Company. This small but short lived laboratory was the crucible for Cattell’s exploration of the idea of mental testing, an idea that came to fruition with his publication of “Mental Tests and Measurement” (Mind, 1890). Hence it represents a significant step in the evolution of the science of psychometrics.