|
Test Bias
By test bias, we mean a formalization of the intuitive idea that a test is less valid for one group of examinees than for another group in its attempt to assess examinee differences in a prescribed latent trait, such as mathematics ability. It can be seen that test bias is the result of individually biased items acting in concert through a test-scoring method, such as number correct, to produce a biased test. In another article of ours, this new conceptualization of test bias is used to undergird a new statistical test (SIBTEST) for psychological test bias ( Shealy & Stout, in press). SIBTEST can be used to detect DIF, DTF, item bias, or test bias. Also, a large-scale simulation study was conducted of the performance properties of this statistical procedure, in particular as compared with the Holland and Thayer ( 1988) modification of the Mantel-Haenszel test. SIBTEST software with accompanying manual can be obtained from Stout.
It may be useful at this point to consider the language that we ordinarily use in connection with studies of item and test bias-in particular, the use of the word bias itself. Several have defined item bias in very much the same way, essentially: An item is biased if equally able (or proficient) individuals, from different groups, do not have equal probabilities of answering the item correctly.
The general notion of conditioning on a criterion variable in the process of examining group differences is also illustrated in the Cole ( 1973) model of test bias, in which, for a given score on the criterion, the average test scores of the two groups being compared are examined. When explicit matching is carried out on an observed score, as it is for the chisquare, Mantel-Haenszel, and standardization methods, it appears that the matching variable should be stratified as finely as possible, consistent with the amount of data on hand. To use broad strata would allow group differences within the strata to dilute the effect of the matching.

Test bias occurs because of the presence of nuisance determinants possessed in differing amounts by different examinee groups. Through the presence of these nuisance determinants, bias then is expressed in one or more items. A third feature, also possible because of our multidimensional modeling approach, is that a careful distinction is made between genuine test bias and nonbias differences in examinee group performance that are caused by examinee group differences in target ability distributions. It is important that the latter not be labeled mistakenly as test bias.
In particular, this can be true even if each individual item displays only a minor amount of item bias. For example, "word problems" on a "mathematics test" that are too dependent on sophisticated written English comprehension could combine to produce pervasive test bias against English-as-a-second-language examinees.
|