Paradigm lost, Paradigm regained: Statistics in Language Testing

Journal of Language Teaching, 31 (2), 131-139, 1997.

Author: Raphael Gamaroff

ABSTRACT

1. INTRODUCTION

2. “OLD PARADIGM” VERSUS “NEW PARADIGM” RESEARCH

3. NEGOTIATING THE TASK-DEMANDS

4. CONCLUSION

REFERENCES

Abstract

The main issue in educational testing is how to measure, and accurately, individual differences within language-specific abilities and academic abilities, i.e. how to recognise performance, which has to do with the setting of valid standards. Valid standards should be concerned with fulfilling the relevant purposes of education. In South Africa the use of statistics in evaluation is often linked to the oppression of the disadvantaged. This view seems to be gaining influence among academics and policy makers in South Africa, and for this reason the importance of statistical methods in evaluation needs serious reconsideration in terms of relevance. It is on the issue of relevance that people differ. This is the reason why educational issues in the context of evaluation (e.g. admission tests, placement tests and promotion tests), are beginning to play second fiddle to the more imperious need for sociopolitical transformation. Much is at stake in testing, where evaluations have to be made by human beings of other human beings, where judgements (often the occasion, if not the cause, of much distress) have to be made about whether somebody should be admitted to an education programme or to a job, or promoted to a higher level. Within the sociopolitical and multi-lingual-cultural- racial-ethnic context of South Africa, these judgements assume an intense poignancy.

1. INTRODUCTION

Language testing is closely related to one’s theory of what language is, which in turn is closely related to one’s theory of how languages are learnt. Thus in order to answer the question “what are we testing?”, we need to answer the question “what is being learnt?”. And to answer the question “what is being learnt?” We also need to ask “what are we testing?

[In order to] arrive at a greater specificity [of language proficiency], it will now be advantageous to look at the issue from the point of view of the field that is mostly directly concerned with the precise description and measurement of second language knowledge, namely second language testin(Spolsky, 1989:59; my square brackets).

An important reason why endeavours are made to improve learning and teaching is in order to improve performance on tests, which is not the same thing as teaching to the test. In the former, one is concerned with improving the ability to perform one’s competence; in the latter, one is merely concerned with the ability to “perform” (a test).

A major part of testing is its measurement. Owing to the fact that statistics in educational measurement is such a controversial issue, it is necessary to consider rigorously and dispassionately the value of statistics in evaluation, keeping in mind that one of the major challenges in the improvement of education is the creation of a more appropriate and effective system of evaluation (King and Van Den Berg, 1993:207), or to use Rowntree’s (1977:1) term “assessment”:

If we wish to discover the truth about an educational system, we must look into its assessment procedures. What student qualities and achievements are actively valued and rewarded by the system? How are its purposes and intentions realised? To what extent are the hopes and ideals, aims and objectives professed by the system ever truly perceived, valued and striven for by those who make their way within it? The answers to such questions are to be found in what the system requires students to do in order to survive and prosper. The spirit and style of student assessment defines the de facto curriculum.

What the system “requires students to do” (Rowntree above) is what validity is concerned with; in other words, with the purpose of (test) behaviour. I adopt the view of validity as a “unitary concept that describes an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (Messick, 1988:2).

What is occurring in South Africa is an effort to downplay statistical (psychometric) measurement, which is linked to the resistance to the unpopular notion of the one-off test. Many language researchers and psychologists oppose the use of statistics (Spolsky, 1978,1985; Macdonald, 1990,1990a). This opposition, I believe, is having a negative influence on educational policy in South Africa. I shall argue that this opposition to psychometric measurement has not been reasoned out in a cogent way.

According to Nunan (1992:20): “underpinning quantitative research is the positivistic notion that the basic function of research is to uncover facts and truths which are independent of the researcher.” Qualitative researchers question the notion of an objective reality.”

As Rist (1977:43) asserts: “ultimately, the issue is not research strategies, per se. Rather, the adherence to one paradigm as opposed to another predisposes one to view the world and the events within it in profoundly different ways.”

In a similar vein, Macdonald (1990:21, 40) contrasts the qualitative and “illuminative” (Pariett, 1981) paradigm of ethnographical research, which she favours, to the psychometric paradigm, which she rejects. The paradigm one chooses predisposes one to view the world in a certain way. Indeed, and in deed, the paradigm is the view.

2. “OLD PARADIGM” VERSUS “NEW PARADIGM” RESEARCH

I examine the psychometric controversy, and its sociopolitical consequences and ethical implications, particularly for South Africa. The controversy involves the following two opposing views: the one view, which holds that statistical measurement is of limited value, is represented by authors such as Spolsky (1978,1985), Lantolf and Frawley (1988) and Macdonald (1990 a 1990); the other view, which holds that statistical measurement is of considerable value, is represented by such authors as Popham (1981), Oller (1979,1983) and Stevenson (1985).

Spolsky (1978,1985), Lantolf & Frawley (1988) and Macdonald (1990) maintain that the psychometric paradigm reduces humans to objects. Lantolf & Frawley (1988:181) assert that psychometrics is an “imposition” upon the reality of an open-ended system, because it is “criterion-reductive, analytically-derived and norm-referenced”. Spolsky (1985:33-34) denounces psychometrists as “hocus pocus” scientists:

In the approach of scientific modern tests, the criterion of authenticity of task is generally submerged by the greater attention given to psychometric critera of validity and reliability. The psychometrists are ‘hocus-pocus’ scientists in the fullest sense; in their arguments, they sometimes even claim not to care what they measure provided that their measurement predicts the criterion variable: face validity receives no more than lip service… It is in what I have labeled the postmodern period and approach to language testing that the criterion of authenticity of task has come to have special importance. (My emphasis)

For Harrison (1983:84), statistical measurement is inappropriate due to its subjective nature:

Testing is traditionally associated with exactitude, but it is not an exact science… – The quantities resulting from test-taking look like exact figures – 69 percent looks different from 68 percent but cannot be so for practical purposes, though test writers may imply that they are distinguishable by working out tables of precise equivalences of test and level, and teachers may believe them. These interpretations of scores are inappropriate even for traditional testing but for communicative testing they are completely irrelevant. The outcome of a communicative test is a series of achievements, not a score denoting an abstract ‘level’.

Harrison seems to mean that “the quantities resulting from test-taking [which] look like exact figures” appear to measure objectively, but in fact they do nothing of the kind- rather, they measure subjectively. For Morrow (1981:12)

[o]ne of the most significant features of psychometric tests as opposed to those of ‘prescientific’ days is the development of the twin concepts of reliability and validity… The basis of reliability claimed by Lado is objectivity The rather obvious point has, however not escaped observers… that Lado’s tests are objective only in terms of actual assessment. In terms of the evaluation of the numerical score yielded, and perhaps more importantly, in terms of the construction of the test itself, subjective factors play a large part.

Contrary to the negative attitudes towards statistics mentioned above, Oller (1983:37) believes that statistical procedures play an important role in language testing, while Stevenson (1985:112) maintains that most language teachers have a poor knowledge of language testing and educational measurement. The problem, however, is larger than ignorance, and often involves a certain degree of quantophobia. Popham (1981:203) maintains that there exists the misconception among those who don’t know or like statistics that statistics makes the research more difficult. It seems more reasonable to argue that it is more difficult to obtain useful results from qualitative data alone, because they do not give a complete enough picture of the informants under investigation (Popham, 1981:203).

3. NEGOTIATING THE TASK-DEMANDS

Macdonald’s “Threshold Project” in primary schools in the former Bophuthatswana (now the North West Province), which involved several researchers, has had a strong influence on recent attitudes towards testing in South Africa. The views expressed in the ‘Threshold Project’ are also relevant to all testing, from the primary school through to tertiary education.

Macdonald (1990 a: 46) rejects the Human Sciences Research Council’s norm-referenced test which serves as a diagnostic tool for pupils entering Grades 4, 5 and 6. She mentions the following: the average score of pupils from these three standards were 22%, 44% and 66% respectively; the HSRC recommends that the test be converted into a criterion-referenced test, where Grade 4 pupils should score a minimum of 80% to gain admission to Grade 5.

Macdonald has two objections to this approach: Firstly, the majority of prospective Grade 5 pupils would probably get far less than 80% on such a test, which would mean rejection for admission to Std 3. The predicament, maintains Macdonald, would then be what to do with these unsuccessful Grade 4 children. This is ultimately a social problem that cannot be resolved by the HSRC or by Macdonald.

Her second objection is that the causal or correlational link between language proficiency and academic achievement is not clear. For Macdonald (1990 a: 42)

the most difficult connection to make is that between different aspects of English communicative competence and their relation – causal or correlational – to formal school learning through EMl (English as a medium of instruction). If one is able to set up these relationships in a reasoned way – and nobody to our knowledge has gone very far in this task – then the significance of the current test scores would be absolutely transparent. There is a way through this conundrum, and that is to change the nature of the question.

Macdonald’s predicament of what to do with the large number of unsuccessful applicants to higher grades connects with her “conundrum” of the opaque relationship between language proficiency and academic achievement. What these two problems have in common is the question of individual success, which is “one of the major conundrums in the Second Language Acquisition (SLA) field” (Larsen-Freeman & Long, 1990: 153; see also Diller, 1981).

Macdonald claims that the HSRC’s tests are invalid, firstly, because there would be many children excluded from school if the HSRC’s recommendation were to be followed, and secondly, because the relationship between language proficiency and academic achievement is not clear.

With regard to the HSRC’s tests, the HSRC equates failing Grade 4 with lack of sufficientacademic ability. This lack, I suggest, along with many other authors (for example Cummins, 1979,1980; Collier, 1987) is intimately tied up with the lack of the development of what Cummins (1983) calls Cognitive and Academic Language Proficiency (CALP), which is closely connected to the development of cognitive abilities through the mother tongue. There is a close relationship between the CALP strategies (which are usually learnt in an artificial, i.e. tutored situation) developed through a first or second language and the strategies that are used to learn any other academic subject. All these strategies are rooted in the ability of learning how to learn.

Macdonald rejects the HSRC’s “old” paradigm of psychometric testing and suggests that all these children should be allowed into Std 3 (and the higher standards?) regardless of their ability as measured by tests. The implication of the HSRC’s test policy, however, seems to be that passing a standard should not be equated with academic success, because in spite of the fact that children may pass all their standards (which may often be through automatic promotion), this cannot be regarded as authentic success. According to the editorial of Educamus (1990), there is a low failure rate from preschool to Grade 11 in DET secondary schools, because low ability pupils in many DET schools are autocratically promoted through the standards, except for the final Grade 12 external examination.

A third problem for Macdonald (1990 a:46) is that

doing things in such a post hoc way [namely, the HSRC’s psychometric tests] would fail to force us into analyzing the nature of the learning that the child has to be able to meaningfully participate in… we would have described a test and some external criteria and identified children through the use of these – but we would have failed to explain what it is the children have to be able to do. (My square brackets)

Macdonald (above) is contrasting the “post hoc” psychometric paradigm of the HSRC which “fail[s] to explain what it is the children have to be able to do” with her “negotiating the task-demands”, which she claims does explain what children have to be able to do. Which raises the question: What is a real, authentic, natural task? For Macdonald the answer to this question lies in “negotiating the task-demands”.

Macdonald’s (1990 a:46) solution to her three problems mentioned above is to replace the “outdated and rigid modes of curriculum development in South Africa” such as psychometric measurement (norm-referenced and criterion-referenced tests) and the general ability of communicative proficiency with “negotiating the task-demands of Std 3 [Grade 5]”, which involves “going from one situation (and knowledge domain) to another to see how the curriculum in its broadest sense has been constituted, and which aspects are negotiable”. Examples of such tasks-demands are (Macdonald, 1990a:47):

1. Following a simple set of instructions for carrying out a task.
2. Showing command of a range of vocabulary (in semantic clusters) from across the curriculum.

3. Solving problems involving logical connectives.

4. Being able to show comprehension of simple stories and information books.

But do we know what a real-life task is (not merely looks like), or if we knew, do we know whether it is necessary to do real-life tasks in order to learn or to prove that we are proficient to do them? This question lies at the heart of the problem of what is an authentic language activity, that is, an authentic test. Alderson (1983:90), in the context of communicative (that is, real-life) tests, maintains that we do not yet know what communicative tests are, owing to the fact that “we stand before an abyss of ignorance The criteria that may provide some help comes from “theory” and “from what one knows and believes about language and communicative abilities, and from what one knows and believes about communication with and through language” (Alderson, 1983:91). What Alderson maintains about communicative tests may also be true about negotiating Macdonald’s “task-demands” above. Thus it doesn’t seem wise to try and separate – as Macdonald suggests – (general) communicative proficiency from a task-demand such as “showing command of a range of vocabulary (in semantic clusters) from across the curriculum” (Macdonald, 1990 a:47). After all, the most demanding part of “negotiating the task-demands” is often the (general) communicative proficiency part, especially for limited English proficiency pupils. Black students often have more problems with general background knowledge than with new knowledge. For this reason a radical separation should not be made between a Language for Specific Purposes task and a general proficiency task, because the harder part is often the general language proficiency part, especially for low English proficiency pupils.

Fodor (1980:149) suggests that theory so far, has not been of much help in redeeming our knowledge and beliefs from the abyss; “there simply isn’t any theory of how learning can affect concepts”. This implies that we are not clear about how to test concepts, because if we are not clear about how concepts are learnt, we cannot be clear about how they are tested. And to test language is to test concepts. Thus, from the theoretical perspective, the HSRC’s psychometric paradigm is, to say the least, not worse than Macdonald’s “negotiating the task-demands”. Macdonald’s (1990 a:46) argument, as mentioned earlier, is that the HSRC tests do not tap what learners “have to be able to do”. The problem is that the connection between the activity of doing “old” paradigm tests, such as those used by the HSRC, and the “new” paradigm activity of “negotiating task-demands” is far from clear.

Macdonald’s (1990 a:15,28,31,39) statistical data, oddly, are dealt with under the rubric of “qualitative” data, which explains why Macdonald appears to be paying lip service to quantitative data 2

The difficulty in statistical research is trying to be both group-orientated and individual-oriented. Whatever the inadequacies of statistics, the best argument for its usefulness is the fact that much of academic evaluation ultimately ends up as a score, and if that is the brutish fact of the matter, we might as well try and measure this score properly. Having said that, it is undeniable that “true ethnography demands as much training skill” (Nunan, 1992:53) as statistical measurement. What is important is that statisticians and ethnographers both realise that each has a crucial – and complementary – contribution to make to the human sciences. Lip service, either to “face validity” (that is, “real-life”; see Spolsky (1978, 1985) or to psychometrics does a disservice to both.

In the last decade there have been attempts towards making educational research more “human”, and through these attempts has sprung the conflict between the orthodox scientific and objective methods of experimental research and statistical analysis, on the one hand, and “new paradigm research” (Reason & Rowan, 1981), on the other.

Below is a summary of the salient features of “new paradigm research” (Reason & Rowan,

1981:xiv-xvi):

1. There is too much “quantophrenia” going on. The emphasis should fall on human significance, not on statistical significance. Researchers should become involved in the human side of the phenomenon under study, because the person behind the data can often upset the neat statistics. This means that people should not be reduced to variables or to operational definitions in order to be manipulated into a research design.

2· Care must be taken not to make outlandish generalisations from unrepresentative samples.

3· Safe, respectable research should be avoided.

4· Fear of victimisation may cause the researcher to pick only those bits of research that will impress and please.

5· Science requires the humility to change one’s views in the light of better theories or new observations.

Reason and Rowan’s (1981) view is that statistical (quantitative, objective) research and “human” (qualitative, subjective) research are complementary. Rutherford (1981) echoes Reason and Rowan’s misgivings about the danger of reducing humans to objects. Rutherford (1987:65) quotes the physicist Niels Bohr: “Isolated material particles are abstractions their properties being definable and observable only through their interaction with other systems.” Rutherford’s message is that (the testing of) humans cannot be isolated into parts. This is probably true. But, what should be in dispute as far as language is concerned is not whether language should be tested through its (reductive, mechanistic) parts or through the organic) whole, but how the parts and the whole interact; which is not only the basic problem of testing, but also of learning, knowing, and of being (human).

4. CONCLUSION

Statistics is a contentious and often an odious issue in the human sciences. However, without (hard) statistical evidence, that is, quantitative evidence, language evaluation – in fact all educational evaluation, in my view – would be reduced to a fistful of profiles, case studies and anecdotes. Thus, to split quantitative and qualitative research into separate paradigms is symptomatic of the urge to find strict oppositions where there are none; an urge that originates from the human and humanistic fear of “reductionism”; of the fear of disempowerment). Ironically, this fear of reductionism, and efforts to prevent it, ends up being the most reductionist – and antihumanist – effort of all. In South Africa, the use of statistics in evaluation is often linked to the oppression and the reduction of power of the “disadvantaged” (a euphemism for “blacks”). This view seems to be gaining influence among policy makers in South Africa, and for this reason the importance of statistical methods in evaluation needs serious reconsideration.

The paradigm of qualitative research is safe and respectable, because it describes; the paradigm of quantitative research is neither safe nor respectable, because it prescribes. Yet, in order to make moral, political and economic sense of evaluation, both paradigms (a “holodigm”!) are necessary. The reduction of one paradigm leads to the reduction of the other. Accordingly, the suggestion that quantitative (psychometric) measurement be replaced by qualitative methods such as “negotiating the task-demands” (Macdonald 1990a:46) might be unwise. Both, namely, psychometrics and “negotiating the task-demands”, should work hand in hand.

The main issue in educational testing is how to measure, and accurately, individual differences within language-specific abilities and academic abilities, i.e. how to recognise performance, which has to do with the setting of valid standards, i.e. with what one considers relevant to fulfilling the purposes of education. And it is on this issue of relevance that people differ. This is the reason why educational issues, in this context, evaluation (e.g. admission tests, placement tests and promotion tests), are beginning to play second fiddle to the more imperious need for sociopolitical transformation, whose nom de guerre is “empowerment”.

It seems that there are two irreconcilable world views, or paradigms: the HSRC’s and Macdonald’s. However, this is no cause for alarm, because science and academia are generated by – and seemingly thrive on (unlike politics) – incompatible theories, e.g. Chomsky versus Piaget, Piaget versus Vygotsky, Vygotsky versus Chomsky; and in testing, Spolsky, Macdonald, Lantolf and Frawley versus OIler, Stevenson and Popham.

Much is at stake in testing, where evaluations have to be made by human beings of other human beings; where judgements (often the occasion, if not the cause, of much distress) have to be made about whether somebody should be admitted to an education programme or to a job; or promoted to a higher level. Within the sociopolitical and multi-lingual-cultural-racial-ethnic context of South Africa, these judgements assume an intense poignancy.

REFERENCES

Alderson, J.C. (1983). Who needs jam? In: Hughes, A. & Porter, D. (1983) Current developments in language testing. London: Academic Press.

Collier, V.P. (1987). Age and rate of acquisition of second language for academic purposes. TESOL Quarterly, 2114, pp. 617-641

Cummins, J. (1979). Linguistic interdependence and the educational development of bilingual children. Review of Educational Research, 49, pp. 222-251.

Cummins, J. (1980). The cross-lingual dimensions of language proficiency: Implications for bilingual education and the optimal age issue. TESOL Quarterly, 4/12, pp. 175-87.

Cummins, J. (1983). Language proficiency and academic achievement. In: Oller, J.W. (Jr.), (ed.). (1983). Issues in language testing research. Rowley, Massachusetts: Newbury Publishers.

Diller, K.C. (ed.). (1981). Individual differences and universals in language learning aptitude. Rowley, Massachusetts: Newbury House.

Educamus. (1990). Editorial: internal promotions, 36/9, pp. 3. Pretoria: Department of Education and Training.

Fodor, J.R. (1980). Fixation of belief and concept acquisition. In: Piatelli-Palmarini, M (1980) Language and learning: The debate between Jean Piaget and Noam Chomsky London: Routledge, Kegan & Paul.

Harrison A. (1983). Communicative testing: Jam tomorrow? In: Hughes, A. & Porter, D. (eds.). (1983). Current developments in language testing. London: Academic Press.

Hutchinson, T. & Waters, A. (1987). English for special purposes; A learner-centred approach. Cambridge: Cambridge University Press.

King, M. & Van den Berg, 0. (1993). The Independent Examinations Board, August 1989 -February 1992: A narrative. In: Taylor, N. (ed.). (1993) Inventing knowledge: Contests in curriculum instruction. Cape Town: Maskew Miller Longman.

Lantolf, P. & Frawley, W. (1988). Proficiency: Understanding the construct. Studies in Second Language Acquisition (SLLA), 10/2, pp. 181-195.

Larsen-Freeman, D. & Long, M. H. (1990). An introduction to second language acquisition research. New York: Longman.

Macdonald C. A. (1990) Crossing the Threshold into standard three in black education:The consolidated main report of the Threshold Project Pretoria: Human Sciences Research Council (HSRC).

Macdonald, C. A. (1990 (a). English language skills evaluation (A final report of the Threshold Project), Report Soling-1 7. Pretoria: Human Sciences Research Council (HSRC).

Messick, S. (1988). Meaning and values in test validation: The science and ethics of measurement. Princeton, New Jersey: Educational Testing Service.

Morrow, K. (1981). Communicative language testing: Revolution or evolution. In: Alderson J (ed.). (1981) Issues in language testing. ELT Documents, The British Council.

Nunan, D. (1992). Research methods in language learning. Cambridge, New York. Cambridge University Press.

Oller, J.W. (Jr.). (1979). Language tests at school. London: Longman.

Rist, R. (1977). On the relations among educational research paradigms: from disdain to détente. Anthropology an Education Quarterly, 8, pp. 42-49.

Rowntree, D. (1977). Assessing students: How shall we know them, London: Harper & Row Publishers.

Spolsky, B. (1978). Approaches to language testing. In: Spolsky, B (ed.)., (1978). Advances in Language Testing Series, 2. Arlington, Virginia. Center for Applied Linguistics.

Rutherford, W.E. (1987). Second language grammar: Learning and teaching. London: Longman.

Spolsky, B. (1978). Approaches to language testing. In: Spolsky, B (ed.). Advances in Language Testing Series, 2. Arlington, Virginia. Center for Applied Linguistics.

Spolsky, B. (1985). The limits of authenticity in language testing. Language Testing, 2, pp. 31-40.

Spolsky, B. (1989). Conditions for second language learning. Oxford: Oxford University Press.

Stevenson, D.K. (1985). Pop validity and performance testing. In: Lee, Y; Fok, A; Lord, R; & Low, G. (eds.). (1985). New directions in language testing. Oxford: Pergamon.

One response to “Paradigm lost, Paradigm regained: Statistics in Language Testing

  1. http://goanalyze.info/aleqt.com May 31, 2017 at 3:59 pm

    Ohhh, great sparkle and colors! Thanks so much for joining our latest challenge Girls and Glitter at Designed2Delight. It’s a pleasure to see your work and be inspired by it!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: