Computerized Adaptive Tests: Applications (13)
Chair: Matthieu Brinkhuis, Thursday 11.30 - 12.50, Boys Smith Room, Fisher Building.
Neils Smits , Department of Clinical Psychology, Vrije Universiteit Amsterdam, Harrie Vorst, Department of Psychological Methods, University of Amsterdam, the Netherlands. Alternatives to CAT for a short internet administration of the SVL questionnaire. (177)
Haya Shamir, Erin Phinney Johnson and Kimberly Brown Waterford Research Institute, Salt Lake City UT, USA. Validity and reliability of the Waterford Assessment of Core Skills. (071)
Aries Yulianto, Faculty of Psychology, University of Indonesia, Jawa Barat, Indonesia. Did computer simulation simulate actual performance on CAT? Comparison between real-data simulation and real CAT performance in Indonesia. (041)
Delphine Courvoisier, Michael Eid and Tanja Lischetzke, University of Geneva, Switzerland. Patterns of compliance to a computerized mobile phone assessment: New insights from Rasch models. (194)
ABSTRACTS
Alternatives to CAT for a short internet administration of the SVL questionnaire. (177)
Neils Smits and Harrie Vorst
The Schoolvragenlijst (SVL, Vorst, 1990), is a popular Dutch questionnaire for students (aged 9-16) which measures attitudes toward school and learning-processes, and contains 160 items. Recently, an internet version was developed which allows for Computerized Adaptive Testing (CAT). When performing CAT on the internet, the calculations (estimating ?'s, selecting new items etc.) need to be done on the tester's server. The SVL is typically administered to groups of students. When such groups are assessed simultaneously, the the sever may be overloaded. This may eventually cause the assessments to take more instead of less time. In this project it is studied whether it is beneficial to use procedures for item selection which are not based upon such heavy calculations. Three alternative methods will be used which have no adaptive nature: Regression trees (e.g., Yan, Lewis, & Stocking, 2004), incomplete designs (e.g., Smits & Vorst, 2007), and short forms (e.g., Hol, Vorst, & Mellenbergh, 2005). A real data simulation study is performed using a large data set (N = 12,000) with scores on all the items of the SVL. CAT and the three alternative procedures will be run for several numbers of items administered for all simulees. After the final item is administered a ? estimate is obtained for all incomplete administrations. The estimate of all procedures will be contrasted with the complete data estimate. Moreover, it will be studied whether the difference between the outcomes of CAT and those of the other procedures are substantial or only marginal. References: Hol, A. M., Vorst, H. C. M., & Mellenbergh, G. J. (2005). A randomized experiment to compare conventional, computerized, and computerized adaptive administration of ordinal polytomous attitude items. Applied Psychological Measurement, 29, 159-183. Smits, N., & Vorst, H. C. M. (2007). Reducing test length through structurally incomplete designs: An illustration. Learning and Individual Differences, 17, 25-34. Vorst, H. C. M. (1990). Handleiding en verantwoording bij de schoolvragenlijst (SVL) [User manual of the school questionnaire for primary and secondary education: SVL]. Nijmegen, NL: Berkhout BV. Yan, D., Lewis, C., & Stocking, M. (2004). Adaptive testing with regression trees in the presence of multidimensionality. Journal of Educational and Behavioral Statistics, 29, 293-316.
Validity and reliability of the Waterford Assessment of Core Skills. (071)
Haya Shamir, Erin Phinney Johnson and Kimberly Brown
The Waterford Assessment of Core Skills (WACS) is an engaging computerized adaptive test for children PreK through 2nd grade, assessing early reading and pre-reading skills. IRT: Difficulty values for 2,680 test items were calibrated for IRT with a Rasch 1-PL model. Subsequently, 283 items were removed for misfit or DIF indication of a gender bias and 127 misfit persons (out of 8,800) were removed. Model fit analysis showed a strong fit to the 1-PL model. Validity: Content validity was established against state and national standards. A Principal Components Analysis and exploratory factor analysis verified that WACS is unidimensional, establishing construct validity. Comparison of student performance on WACS to performance on five commonly-used standardized tests also measuring early reading skill established concurrent validity. All correlations between tests are highly significant, ranging from r = .5 to .74. Additional test data will be collected in April, 2009 to increase concurrent validity data and determine predictive validity. Reliability: The marginal reliability coefficient (r = .93) indicates strong internal reliability. Test-retest reliability will be determined in April, 2009. Preliminary test-retest correlations for 1st grade WACS, with a sample size of 85, were high for a computer adaptive test (r = .73, p < .001).
Did computer simulation simulate actual performance on CAT? Comparison between real-data simulation and real CAT performance in Indonesia. (041)
Aries Yulianto
There is a significant increase in the use of the computers in Indonesia. Nevertheless, computers are rarely used for test administration. Therefore, there is a possibility to develop computerized test administration in Indonesia. With CAT, testing can be more effective and efficient. The efficiency and precision afforded by CAT has typically been studied using computer simulation techniques (Shu-Ying Chen, 2001). However, most researchers would agree that simulations may not fully reflect the reality of examinee performance on a test (Wang, Pan, & Harris, 1999). The objective of this study is to identify the effectiveness of real-data simulation in predicting the true performance. The data used in this study is obtained from Yulianto (2006), which conduct an experiment on Raven’s Advance Progressive Matrices’ score on paper-pencil test and CAT within 2 weeks interval. This study use POSTSIM for real-data simulation and Fasttest Pro for test administration with 28 college students who participate on both paper-pencil test and CAT. Overall, there is no significant difference on test score between real-data simulation and real performance. With this finding, it can be concluded that real-data simulation can predict real performance on CAT.
Patterns of compliance to a computerized mobile phone assessment: New insights from Rasch models. (194)
Delphine Courvoisier, Michael Eid and Tanja Lischetzke
Ecological momentary assessment (EMA) is a method that is now largely used to study behavior and mood in the settings in which they naturally occur. It maximizes ecological validity and avoids the limitations of retrospective self-reports. As with all intensive longitudinal data studies, compliance may be low. Thus, it is important to determine if some periods of calls increase compliance. To examine the compliance patterns of EMA administered via mobile phone, data were collected on 6 occasions per day for 7 days (N = 305 participants). Each call lasted around one minute. Compliance patterns were analyzed using Rasch models, with each of the 42 calls being considered as a success if it was answered and a failure otherwise. Results show that, generally, compliance is relatively high (mean compliance: 74.9%), with almost all respondents answering at least half the calls. Within day, compliance increases throughout the day and is highest (item difficulty is lowest) after 5 pm. Across days, compliance decreases slightly as the day go by, probably due to a lassitude effect. Neither personality nor age and sex were significantly related to the latent personality trait of compliance. In conclusion, to maximize compliance, calls should be made after 5 pm even on a cell phone. Furthermore, data from computerized mobile phone assessment can safely be considered as at least missing at random.