Psychometrics and Reductionism in Language Assessment

Author: Raphael Gamaroff

Much of the content of  this unpublished article appears in “Paradigm lost, paradigm regained: Statistics in language testing” , on this WordPress site.


Choosing a research paradigm

Opposition to psychometrics

Norm-referenced tests

What did the others get?




The basic problem in language assessment remains how to assess individual differences within language-specific abilities. In modern democracies, psychometrics in language assessment, and in educational assessment in general, is eschewed by many where a “ethnographical”, or “naturalist”, method of assessment is preferred. This paper discusses the opposition to psychometric (statistical) assessment in testing, with special reference to language testing, and defends the use of psychometrics. It argued that psychometric methods and ethnographical, or “naturalist”, methods of assessment both have a crucial, complementary and valuable role to play. The history of the quantitative/qualitative controversy can be viewed from two diametrically opposite angles: (1) qualitative research has been dominated by quantitative research for many decades and is only in recent years becoming accepted as a legitimate scinetific approach. Or (2) quantitative research has been for more than two decades challenging qualitative methods and also setting itself up as the only legitimate form of research.

Choosing a research paradigm

Philosophy and science is saddled with two contrasting paradigms: the empiricist/objective/reductionist paradigm and the ethnographical/subjective/ holistic paradigm. The first paradigm, the “standard account” involves putting questions directly to Nature and letting it answer: the paradigm of empiricist, or normal, science and the Age of Enlightenment. This paradigm is based on three assumptions: (i) naive realism, i.e. the reality of objects are separate from observation, (ii) the existence of a universal scientific language, and (iii) the correspondence theory of truth, i.e. propositions about the world are true if they correspond to what is out there; theories about the world must be inferred from observation.An alternative paradigm, the “seamless web”, provides different answers to those offered by the first paradigm. This alternative paradigm has various sectarian aliases: naturalistic; inductivist, postpositivistic, ethnographical, phenomenological, subjective, qualitative, hermeneutic, humanistic and actor-network . The “seamless web” protagonists accuse reductionists of tearing things from their context.

The kind of assessment procedures one uses is a window onto what abilities are valued and rewarded (Rowntree, 1977:1). I shall argue that it is not only possible to reconcile the two antagonistic paradigms described above but it is also necessary to do so, if we wish to discover any truth about language asessment, and assessment in general.

According to Nunan (1992:20),

[u]nderpinniing quantitative research is the positivistic notion that the basic function of research is to uncover facts and truths which are independent of the researcher. Qualitative researchers question the notion of an objective reality.

Opposition to psychometrics

The increasing number of studies in a purely “ethnographical/ sociolinguistic approach to language proficiency assessment” is witness to the opposition to the “positivistic” notion of “quantitative research” (Nunan above).

The opposition to psychometrics is closely connected to the opposition to “reductionist approaches to communicative competence” (Lantolf & Frawley, 1988:182).

Spolsky threw decorum to the wind and referred to “psychometrists” as “hocus-pocus” scientists. For Spolsky, psychometrics was no more than sleight-of-hand psychometry. Spolsky’s recent “postmodern” approach to psychometrics is that it should be used in conjunction with “humanist” approaches (such as those of Lincoln and Guba, and Groddeck described above).

There is another reason – less philosophical than the reasons given above – why psychometric measurement is eschewed in language testing and in language research in general, namely that most language teachers (and many language researchers) have a poor knowledge of language testing and educational measurement and are consequently “metrically naive” (Stevenson, 1985:112, Bonheim, 1997). Yeld (1987:78) speaks of those “who have not been trained in the use of techniques of statistical analysis and are suspicious of what they perceive as ‘number-crunching'” and for this reason prefer “face valdity”. There is also a certain fear of the objectivity of numbers – of not getting (or of not being seen to be getting) them right: mistakes of judgement are much easier to detect in (“objective”) quantitative assessment than in (“subjective”) qualitative assessment.

Norm-referenced tests

The main problem in assessment is how to assess individual people. The individual needs the norm and the norm needs the individual: one without the other is an abstraction from social reality.

Norm-referenced, can be distinguished from criterion-referenced and individual-referenced tests (Ur, 1996:245-246):

1. Norm-referenced tests are concerned with how well an individual performs compared to a group which he or she is a member of. This is traditional psychometric testing.

2. Criterion-referenced tests are concerned with how well an individual performs relative to a fixed criterion, e.g. how to ask questions. This is what Cziko (1982:27) calls “edumetric” testing.

3. Individual-referenced are concerned with how individuals perform relative to their previous performance or to an estimate of their ability.

The emphasis in this discussion is on norm-referenced tests. Norm-referenced tests are important because without data on the variance between individuals within a group, it is not possible to separate what (which is the concern of criterion-referenced tests) an individualknows from what other peopleknow. Individual-referenced tests also cannot be separated from what other people know.

Rowntree (1977:185) explains the importance of the norm in assessment:

Consider a test whose results we are to interpret by comparison with criteria. To do so we must already have decided on a standard of performance and we will regard students who attain it as being significantly different from those who do not…The question is: How do we establish the criterion level? What is to count as the standard? Naturally, we can’t wait to see how students actually do and base our criterion on the average performance of the present group: this would be to go over into blatant norm-referencing. So suppose we base our criterion on what seems reasonable in the light of past experience? Naturally, if the criterion is to be reasonable, this experience must be of similar groups of students in the past. Knowing what has been achieved in the past will help us avoid setting the criteria inordinately high or low. But isn’t this very close to norm-referencing? It would even be closer if we were to base the criterion not just on that of previous students but on students in general.

What is occurring in South Africa is an effort to downplay psychometric measurement, which is linked to the resistance to the unpopular notion of the one-off (norm-referenced) test and to the preference for process-oriented measures (Mclean, 1996:48; Docking, 1994:15). Docking (1994:15) contrasts the “rigorous and detailed management of competency development” with the “‘loosely’ defined evidence which is ‘doctored’ and legitimated through statistical procedures on the other (traditional teaching and assessment).” I find it hard to understand how one can establish any principles of testing without some – indeed, a large – recourse to norms, no matter how “process-oriented” the task is claimed to be. And norms imply psychometric measurement.

What did the others get?

Psychometrics has much to do with “context”, which the “postpositivist”, or “naturalist”, paradigm claims to be absent in the “positivist” paradigm. In naturalistic inquiry “realities are wholes that cannot be understood in isolation from their contexts” (Lincoln and Guba, 1985:39). Thus, owing to different contexts and interactions, generalisations should be done with caution, if at all. The emphasis, “naturalists” argue, should be on time-bound and context-bound working hypotheses (“idiographic” statements) rather than time-free and context-free generalizations (“nomothetic” statements). The point is that one cannot make idiographic statements without reference to nomothetic statements, i.e. the individual and group are abstractions isolated from each other. This is no less true in psychometrics. A simple example: if my daughter comes home and tells me that she got 80% for a test, a predictable contextual question would be: “What did the others get?” Further, what is a whole if not a bit of a larger whole, and what is a bit but a whole of a smaller bit? In other words, the notion of a gestalt is a relative term and therefore can only exist in terms of something (a context) greater and smaller than itself. The paradox of knowledge is that it is impossible to understand the bits – of culture, theory, language, etc. – unless we understand how the whole fits together in its function; and, without an understanding of the structure of the discrete bits, we won’t be able to understand how the whole works (Rorty, 1980:319).

Psychometrics like language is “fictional”, in the sense of being evocations-representations- constructions of reality. But neither statistics nor language, is a deliberate error or a lie. We try our best to measure with and up to the brains we have been given.


Weir (1993:68) believes that there is a more pressing need for research in formative testing, i.e. in process-oriented methods, than research in summative testing, i.e. “quantitative summaries” (Messick, 1987:3). I believe that there is still a pressing need for research in summative testing, even though mainstream language testing, which is certainly the case in South Africa, is taking a different turn. I suggest that the rejection of psychometric measurement in South Africa in the name of restoring individuality to learning is misguided and is consequently having a negative influence on education in South Africa.

Having said that there is no doubt that “true ethnography demands as much training skill” (Nunan, 1992:53) as psychometric measurement. It it is also true that it is much easier to be proved wrong in a psychometric judgement than in an ethnographical one. This could be one reason for avoiding psychometric research.

What is important is that psychometricians and ethnographers both realise that each has a crucial – and complementary – contribution to make to the human sciences, where the “accumulation of data is at best the humble soil in which the tree of knowledge can grow” (Lorenz, 1969:77).

Being mindful of Hesse’s (1980:5) caveat that “if all theories are dangerous and likely to be superseded, so are the present theories in terms of which the inductivist judges the past”, we need to be humble in any claims we may have to ultimate truth (a good example was set by Spolsky [1995] in his dilution of his strong negative attitude towards psychometrics mentioned earlier) because the search for truth is a never-ending path towards understanding and stability of meanings, which is indispensable for individual freedom and social equilibrium.

According to Lincoln and Guba (1985:114) generalizations presuppose facts, but there is no necessity that only one generalization must emerge: “There are always (logically) multiple possibility generalizations to account for any set of particulars, however extensive and inclusive they may be” (Lincoln & Guba, 1985:114). Yet, if this were so, there could be no knowledge, therefore no stability of meanings, because ambiguity would be the normal attribute of language. If theories could mean anything, we wouldn’t have unhappy theories that lead to disagreements and misunderstandings. All we’d have is happy hot air.


Bonheim, H. Language testing Panel, European Society of the Study of English (ESSE) conference, (Debrecen, Hungary, September 1997).

Docking, R. “Competency-based curricula – the big picture”, Prospect, 9,2(1994):15.

Hesse, M. Revolutions and reconstructions in the philosophy of science, (Bloomington: Indiana University Press, 1980).


Lincoln, Y.S. and E.G. Guba, Naturalistic enquiry, (Newbury Park, California: Sage Publications, 1985).

Lorenz, K. “On the biology of learning”, in J. Kagan, On the biology of learning. (New York: Harcourt, Brace and World, Inc., 1969).

Mclean, D. “Language education and the national qualifications framework: An introduction to competency-based education and training”, in HSRC. Language assessment and the National Qualifications Framework, (Pretoria: Human Science Research Council Publishers, 1996).

Nunan, D. Research methods in language learning, (Cambridge, New York: Cambridge University Press. 1992).

Nunan, D.Research methods in language learning, (Cambridge, New York: Cambridge University Press. 1992).

Rorty, R. Philosophy and the mirror of nature, (Princeton: Princeton University Press, 1980).

Rowntree, D. Assessing students: How shall we know them, (London: Harper and Row, Publishers, 1977).

Rowntree, D. Assessing students: How shall we know them, (London: Harper and Row, Publishers, 1977).

Spolsky, B. Measured words, (Oxford: Oxford University Press, 1995).

Stevenson, D.K. “Pop validity and performance testing”, in Y. Lee, A. Fok, R. Lord and G. Low (eds.), New directions in language testing, (Oxford: Pergamon. 1985).

Ur, P. 1996. A course in language teaching: practice and theory. Cambridge: Cambridge University Press.

Weir, C.J. Understanding and developing language tests, (London: Prentice Hall, 1993).

Yeld, N. Communicative language testing and validity Journal of the South African Association of Language Teaching (SAALT), 21,3(1987):78.

2 responses to “Psychometrics and Reductionism in Language Assessment

  1. baukredit vergleich August 30, 2017 at 1:52 am

    here, you might want to subscribe to the RSS feed for updates on this topic.The previous post is up here. Yes, it’s taken me this long to write about the second part of my Puerto Princesa adventures.

  2. Pingback: ingenieurs marocains

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: