Developing scales and diagnoses (41)

Chair: Jay Verkuilen, Friday 24th July, 11.30 - 12.50, Dirac Room, Fisher Building.

Kun-Shia Liu, Ying-Yao Cheng, Li-Ming Chen and Ching-Wen Hsueh, Graduate Institute of Education, National Sun Yat-sen University, Taiwan. Cut-off points for diagnosing different forms of school bullying. (204)

Ying-Yao Cheng, Kun-Shia Liu, Su-Hsiang Chung and Yih-Rou Chang, Graduate Institute of Education, National Sun Yat-sen University, Taiwan. Development of the Science Gender Stereotype Inventory and the Science Identification Inventory. (205)

Jay Powell and Allan P. Clapp, Robert Morries University, Pittsburgh, USA. Comparing Response Spectrum Analysis Interpretation (RSAI) of test results with Item Response Theory (IRT) and Differential Item Functioning (DIF) using the same test data. (112)

Annouschka Laenen, Ariel Alonso, Geert Molenberghs and Tony Vangeneugden, Hasselt University, Belgium. Estimating the reliability of rating scales from clinical trial data. (257)

ABSTRACTS

Cut-off points for diagnosing different forms of school bullying. (204)
Kun-Shia Liu, Ying-Yao Cheng, Li-Ming Chen and Ching-Wen Hsueh
This study aims to establish suitable cutoff points for different forms of bullying from a diagnostic perspective. The School Bullying Scale (SBS) were developed to measure four forms of bullying, including physical, verbal, relational, and cyber bullying. Participants were 735 students from 7th to 12th grade in seven high schools in Taiwan. The cutoff points were determined through considering scores obtained from Rasch model analysis and counselors’ consensual judgment. Results indicated that different forms of bullying exhibited variant levels of severity and it is necessary to choose different cutoff points for different forms of bullying according to the purpose of the diagnosis.  

Development of the Science Gender Stereotype Inventory and the Science Identification Inventory. (205)
Ying-Yao Cheng, Kun-Shia Liu, Su-Hsiang Chung and Yih-Rou Chang
This study develops and validates two instruments for measuring components of gender stereotype threat in learning sciences: the Science Gender Stereotype Inventory (SGSI), measuring individuals’awareness of gender stereotype threat in learning sciences, and the Science Identification Inventory (SII), measuring individuals’ attitude in learning sciences. These two inventories were administered to 604 eighth graders (303 boys and 301 girls) in Taiwan. Rasch modeling and structural equation modeling analyses indicated that the two inventories
exhibited adequate model-data fit, high reliability and good criterion-related validity, and were invariant across genders in factor structure. These inventories provide useful tools for understanding individuals’susceptible levels to gender stereotype threat in learning sciences.  

Comparing Response Spectrum Analysis Interpretation (RSAI) of test results with Item Response Theory (IRT) and Differential Item Functioning (DIF) using the same test data. (112)
Jay Powell and Allan P. Clapp
Response Spectrum Analysis Interpretation (RSAI) of multiple-choice (M/C) tests is a procedure that makes it possible to provide detailed interpretation of all answers on M/C tests. This procedure makes it possible to provide diagnostic information to teachers for every student in their classes from broad-spectrum large-scale testing programs. The current practice of scoring these tests right/wrong and interpreting them based upon Item Response Theory (IRT) and/or Differential Item Functioning (DIF) may not be able to provide this much depth of information from such tests. This study explores the level of information capture from each of these three approaches to test interpretation to decide which of these three would be more useful for informing teaching to improve educational effectiveness.

Estimating the reliability of rating scales from clinical trial data. (257)
Annouschka Laenen, Ariel Alonso, Geert Molenberghs and Tony Vangeneugden
Rating scales are frequently used as primary or secondary outcome measures in clinical studies where interest lies, for example, in measuring depression, anxiety, or quality of life. Knowing that the reliability of an instrument is population-dependent, it is useful to evaluate the reliability of the measurements obtained within the actual study. However, these studies often have complex longitudinal study designs needing complex measurement models. The classical test theory is then inappropriate, and also generalizability theory would in a longitudinal framework imply unrealistic assumptions regarding the variance structure, error correlations and missing data pattern. Linear mixed models, on the other hand, provide a framework that handles longitudinal data in a very flexible way. In the presentation we discuss an approach to reliability that is based on this modeling family. An extension of the concept of reliability to a general measurement model is proposed and, based on this, two reliability coefficients are introduced. The first one, RT , expresses the average reliability over a sequence of repeated measurements, and the second one, R¤, gives us the reliability of the information we obtain by considering the entire sequence of measurements jointly. The methodology will be illustrated on a real case study.