Examining bias in a test of academic literacy: Does the Test of Academic Literacy Levels (TALL) treat students from English and African language backgrounds differently?
Responsible test design relies on close examination of a number of parameters of a test. After finding a clearly argued, rational basis (construct) for the ability being tested, then articulating this in detailed specifications for subtests and item types, and subsequently setting benchmarks for both test reliability and item productivity, there remains, after the results become available, a number of further dimensions of a test that need attention. This article examines one such dimension: that of Differential Item Functioning (DIF), asking whether there is, in the case of the test under consideration, bias towards a certain group of test-takers (testees), so that they are unfairly disadvantaged by some of the items or task types in the test. The test results across four different years (2005-2008) of a large group of first year students, the bulk of the intake at one South African university, are analysed. The fact that there are variations in DIF across the different years and across different task types (subtests) calls for specific explanations. The findings suggest that one would do well to examine test results in depth, in order to avoid conclusions that may be fashionable but inaccurate. However, the argument returns to the defensibility of the test construct, and what should legitimately be included in that, and, by extension, measured.
Keywords: test design, subtests, item types, Differential Item Functioning (DIF), bias, test results, defensibility, measurement