Per Linguam, 12(1):1996, pp.48-58

Author – Raphael Gamaroff






This article examines concepts often used in debates on language proficiency and proficiency testing. It argues that the notions of “reality”   and the “constructed” world of the test, is not mutually exclusive, because “reality” is also “constructed”. This argument opens up important questions in language testing.


In the abstract to their article with the above title, Lantolf and Frawley state that they ‘argue against a definitional approach to oral proficiency and in favor of a principled approach based on sound theoretical considerations” ( italics added). The authors use oral proficiency as a backdrop to their views on language proficiency in general. These general views are the concern in this article.

In their section “The tail wagging the dog”, Lantolf and Frawley (1988: 182) use the part of Omaggio’s (1986) manual entitled “Defining language proficiency” to illustrate their criticism that the “construct of proficiency, reified in the form of the [American Council on the Teaching of Foreign Languages- ACTFL] Guidelines, has begun to determine how the linguistic performance of real people must be perceived”:

In her discussion she considers various models of communicative competence, including those of Hymes, Munby, Widdowson, and Canale and Swain, all of which are reductionist approaches to communicative competence, because they define communicative competence by reference to a set of constitutional criteria. She then proceeds to a subsection entitled “From Communicative Competence to Proficiency.” However, nowhere in her analysis is there any in-depth consideration of proficiency that is independent of the proficiency test itself [emphasis added].

It is not enough for Lantolf and Frawley that ACTFL recommends “the adoption and acceptance a common yardstick, a series of descriptors of foreign language ability that are based on real life performance” (Woodford, 1979:73; quoted in Hiple, 1987). They see the ACTFL’s tail (the series of real life descriptors) that is wagging the real dog as not real. The unreal tail is the unreal “construct”; the dog being wagged is real people. The metaphor is clear: it is researchers who have fabricated the “construct””, and fabrications have no psychological reality. In other words the construct constricts the reality of “the nontest world of human interaction.” The test world, which for these authors epitomises the term construct, “has come to determine the world, the reverse of proper scientific methodology” (Lantolf & Frawley, 1988:182). Ochsner’s 1979:58) opposition to these so-called fabricators is put even more strongly:

If chemists juggled their basic units like we do, their laboratories would blow up. But more damaging, and less debatable, is the obvious fact that research design and the ‘real’ world only sometimes covary. We trade off internal for external validity, or vice versa; either way, we obtain in our experiments important results only from those small, and trivial, bits of human reality that allow a reductive analysis.

The issue is that the only way to obtain “important results” is from “trivial bits of human reality” (Lantolf & Frawley’s word in the above quotation). Where do “bits” end and “reality” begin? Each thing is a bit of something else. One cannot study wholes without reducing them ~ bits, to smaller wholes, because there does not seem to be any other way to deal with wholes -besides appreciating them – than superficially. The only effective way of studying “human reality” is bit by superficial bit. Every gestalt is a bit of a larger gestalt (Pratt in Goldmann, 1970:123), where one gestalt may be more satisfying than another depending on individual preference. Human reality or the whole person if not reduced, cannot be understood. The labour of a lifetime is required to understand anything from endocrine glands to indoctrination.

The study of language must be one of the most – if not the most – complex of human undertakings. Chomsky, a reductionist par excellence, is witness to this fact. That is why he has chosen to reduce his language interests to the one aspect of language wherein it is possible to fruitfully explore universal principles, namely “linguistic competence”.

The criticism of Hymes, Munby, Widdowson, Canale and Swain, quoted above, suggests that Lantolf and Frawley are concerned not only with oral proficiency but also with language proficiency in general. Recall that they are arguing in “favor of a principled approach based on sound theoretical considerations”, which they seem to think authors such as Widdowson do not use. Yet Widdowson, who was probably not unaware of Lantolf and Frawley’s criticism, ends his Aspects of language teaching” (1990) with the following:” There needs to be a continuing process of principled pragmatic enquiry…I offer this book as a contribution to this process – and ~ such, it can have no conclusion” (my italics). Widdowson perceives the content of both the structural and the notional syllabus to be, in Nunan’s (1988:28) words, “synthetic” and “product orientated”: i.e. the content of both syllabuses is static and lacks the power to consistently generate communicative behaviour. Widdowson’s (1979:141) argument against structuralist and notional syllabuses is that “[i]t has been generally assumed…that performance is a projection of competence. ..that once the rules are specified we automatically account for how people use language.” His argument is that structural and functional-notional syllabuses do not link in past experiences with new experiences, because they lack proper learner involvement (Widdowson, 1979:246). Lantolf and Frawley, and Widdowson have much in common. It seems that the main difference between them lies in the value they place on school learning. All three believe in “teaching language as communication” (the title of Widdowson, 1978), but much of Widdowson’s work is concerned with academic achievement and what Lantolf and Frawley call “unnatural” school “tasks” (e.g. Widdowson, 1968, 1978, 1983, 1990; 1992) rather than with “real life”. According to Lantolf and Frawley (1988:183) “tasks cannot be authentic by definition”, which implies that very little in school is authentic, i.e. natural. But: “Just what is a natural environment’ as far as learning or acquiring a second language under any circumstances is concerned?”‘ asks Morrissey (1983:200); (his specific context is the second language “acquisition”/second language “learning” controversy):

There is no environment, natural or unnatural, that is comparable with the environment in which one learns one’s mother tongue. Furthermore, it seems to me that there is a teaching (i.e. unnatural?) element in any L2-L1 contact situation, not just in cases of formal instruction. This element, even if it only consists in the awareness of the communicants that the [teaching or testing] situation exists, may be a more significant factor in L2 learning and L2 acquisition [and L2 testing] than any other factor that is common to [the natural setting of] L1 acquisition and L2 acquisition.

Lantolf and Frawley are seeking a testing situation analogous to the L2 “acquisition” situation, which in Krashen’s (1981) definition is “natural”. Nature is a slippery customer, and nowhere more so than in the tutored (artificial?) environment. Much of language and learning, like culture, consists of extrapsychological elements, which, in this sense, are an “imposition” upon nature. As far as evaluation is concerned, Widdowson’s “empirical evaluation”, where the focus is on the “process of learning”, which he distinguishes from “assessment”, i.e. “learner attainment matched against norms or criteria of success” (Widdowson, 1990:51), seems to be very close to Lantolf and Frawley’s view. It is not Widdowson, Omaggio, Canale and Swain – who undoubtedly analyse language learning and language use into criteria – who are peddling a false authenticity.

There is no doubt that the test situation is artificial in the sense that the (language) test focuses more on learning language than on using it. But then so is the school situation an artificial situation. It is true, as Lantolf and Frawley (1988:183) point out, that in real life one uses far less words than one uses in “tasks” in the school situation, and this is one of the reasons why, they maintain, tests “cannot be authentic [i.e. natural] by definition.” However, as Politzer & Mcgroarty (1983) show, it is possible to say or write few words (as one often does in natural settings) in a “communicative competence” test by using a “discrete-point” format. When one uses far less words in natural settings than in many “artificial” school tasks, one is in fact using a “discrete-point” approach to communication. The nub of Lantolf and Frawley’s criticism, however, is that the exchange between tester and test-taker is not a natural one, therefore “discrete-point” formats, or any kind of test format cannot be a natural kind of communication. Communicative testing, it seems, would be for them a contradiction in terms. What is more, communicative school “tasks” would also be a contradiction in terms. In that case, school, which may be defined as an institution whose role is to guide learners by defining and dispensing tasks, would be another figmental tail wagging the real dog.

The ACTFL Guidelines, according to Lantolf and Frawley, draw a line between the world and the individual. They regard such a situation as scientifically unprincipled and morally untenable. There is very little in “tasks” such as instructional activities and nothing in tasks such as tests that Lantolf and Frawley (1989:182) find authentic in the Proficiency literature: “the task of the test overpowers and detracts from the other tasks [“instructional activities”] we may assume the speakers are engaged in. In essence there is only one task in OP [oral proficiency] testing – the test.” (As I mentioned earlier, Lantolf and Frawley are using the subdomain of OP testing to illustrate their opposition to language proficiency testing in general). They want language tasks to be contextualised in natural settings such as the “cooking club” (Lantolf & Frawley, 1988), or the “cocktail party” (Alderson, 1981a:56).

Surely what is relevant in a language proficiency test, or in any test, is its predictive validity and not the particular tasks. Obviously, people have to leave school and get to grips with real life, and one should be taught how to cope with the real demands outside the school gates, but teaching and testing are two different, if closely connected things. If I get my eyes tested, which involves looking at an eye-chart – something I would only be called upon to do when I visit the optician/optometrist and never in real life (unless I was an optician!), the results of such a test, would be an accurate assessment of my ability to cross the road at a busy intersection.

Hymes and Canale (1987:1), who commend Lantolf & Frawley’s (1985) earlier “efforts [which have been] eloquently and persuasively voiced,” caution that the danger of “the proficiency movement” as espoused by Lantolf and Frawley and others such as Savignon (1985), as “with any movement, is that a rhetoric of fear and enthusiasm will develop which is more likely to misrepresent and confuse than to clarify the crucial issues. One crucial issue is the measurement of language proficiency. Alderson’s (1981b: 56-57)description of the problem, i.e. whether what we are measuring is authentic, is worth quoting in full (what he says about language evaluation is relevant to all evaluation):

If one is interested in whether students can perform adequately (adequacy being undefined for the moment) at a cocktail party, ‘all’ one has to do is put the student into a cocktail party and see how he fares. The obvious problems with this are that it may not always be possible to put the student into a cocktail party (especially if there are several thousand students involved [or even a hundred, R.G.]), and the fact that the performance is being assessed may actually change the nature of the performance. One solution is to simulate the cocktail party in some way, but that raises problems of authenticity, which relate to the second problem, that of the relationship between the performance and its assessment. Inevitably, any test is in danger of affecting the performance if the testee is aware that he is being tested. To that extent, it is impossible for a test to be ‘authentic’ in the sense of mirroring reality. Of course, tests are themselves authentic situations, and anything that happens in a testing situation must be authentic in its own terms: the problem comes when one tries to relate that testing situation to some authentic communicative situation. In a sense, the argument about authenticity is trivial in that it merely states that language varies from situation to situation. The feeling was expressed that the pursuit of authenticity in our language tests is the pursuit of a chimera: it is simply unobtainable because they are language tests.

Consider the role of grammar in learning/teaching language for communication. Grammar is not something that one raises one’s conscious glasses to at a cooking club or at a cocktail party. Yet it is an essential, if not the major, ingredient in the cake. Widdowson (1990:97-98) puts the case for grammar in the communicative approach to language teaching:

It seems sometimes to be supposed that what is commendable about a communicative approach to language teaching is that it does not, as a structural approach does, have to get learners to puzzle their heads with grammar. If we are looking for nonsense, this suggestion is a prime example. For if this were the really the case, a communicative approach would have very little to commend it. For language learning is essentially learning how grammar functions in the achievement of meaning and it is a mistake to suppose otherwise…. A communicative approach, properly conceived, does not involve the rejection of grammar. On the contrary, it involves recognition of its central mediating role in the use and learning of language.

What Widdowson has stated concerning language learning should be, I would think, the main concern of language testing as well. That is why the concept of an integrative continuum ranging from basic indirect grammatical elements to complex direct language use is very useful, because it emphasises the organic relationship between grammar and discourse, or to use Widdowson’s (1979) terms, between “usage” and “use”. Jeffery’s call (1990:120) to open the case for grammar wider is similar to Widdowson’s call not to neglect the conscious awareness of how grammar functions: in fact a call to make grammar the main link between language learning and language use.

This approach to grammar as “consciousness raising” (Rutherford, 1987) is an extremely useful one in the academic context of language learning. It also has a spin-off for learning in general, because the conscious cognitive and metacognitive strategies required for hypothesising and inferencing are similar in both domains (e.g. if this, then [not] that; if not this, then [not] that). One adopts such an approach based on an appreciation of the following points: in Politzer and McGroaty’s words (1983:186):

Traits of communicative competence can be defined in very precise and specific ways and measured by the same discrete point method as linguistic competence. If these principles are followed, communicative competence emerges as quite distinct from linguistic competence. The two kinds of competence are however related: communicative competence includes abilities which go beyond linguistic competence and which, to a certain degree, can make up for some deficiency in the latter. At the same time, lower levels of linguistic competence impose limits on communicative competence. Any language-related level of communicative competence has a minimum level of linguistic competence as a prerequisite. Communicative competence presupposes linguistic competence, but is not guaranteed by the latter.


Lantolf and Frawley’s rejection of school tests is closely related to their rejection of the psychometric paradigm. They introduce their critique of psychometrics by quoting from Duran (1984:54):

[Developers of proficiency tests] who wish to develop scales of communicative competence skills are unlikely to leave their psychometric perspective – nor should one expect them to. Accordingly, the instrument development strategies for communicative competence skills should adhere to the highest standards of psychometric test design principles.

Lantolf and Frawley argue that such a view is a false imposition and an imposture. To summarise their argument: The direct as well as indirect domination of psychometrics in language testing has obfuscated the concept of proficiency. Psychometrists are more interested keeping their tools of measurement intact than in the object of measurement. They quote Lewontin, Rose and Kaman (1984:91): “the fact that it is possible to devise tests on which individuals score arbitrary points does not mean that the quality being measured by the test is really metric. The illusion is provided by the scale.”

Lantolf and Frawley believe that language proficiency) is “real”, whereas psychometric measurement is not. But how real is (“natural”) language compared to the (“artificial”) language metrics? Real-life tests, unshackled by psychometric validity and reliability, remain prisoners imprecision. If (metric) scales are reductions of reality – many psychometrists are acutely are of the fact that the scale is reductionist – so is language, and perhaps even more so. The illusion, contrary to Lewontin, Rose and Kamin, is not in the scale, but in language, because an illusion is, by definition, a deception. Many psychometrists are not deceived by measurement, because they indeed are aware of its limitations. They are aware that scales are reductionist, but they also aware that all kinds of knowledge, not only numbers, is, by definition, reductionist, because the only human way of accommodating the “logically uncrossable gulf’ that separates world of impressions from the mental or physical world is to reduce the world to a symbolic system, be it words or numbers (Oller, 1995: 282, in his commentary on Einstein). Words are double-edged swords: they cut to clarify but also to kill. Words often also stand in the place of reality, but many protagonists of “real-life” language proficiency – a contradiction in terms – don’t seem to be aware of this fact. If psychometrics is close to being fictional (magical?), so is language. In Jerison’s (1986: 9) words:

It is as if we could communicate by having others see what we see and hear what we hear. The normal role of language in communication is very close to fictional accounts of communication by extrasensory means and may explain the attractiveness of ideas of such psychic powers. These imagined powers are not far removed from what we do in everyday life when we use ordinary language.

Knowledge cannot but be degenerate, owing to the “logically uncrossable gulf’ between Light and reality. Fictional accounting or counting are degeneracies of reality but not as degenerate as errors or lies or illusions. Some kinds of fictional accounting, e.g. hypothetical inferences, predictions, expectancies (Oller, 1995:279), which are the conceptual tools of academic study, indeed of language study and language use as well, progressively degenerate into errors, then into lies, then into illusion or pure nonsense. Some kind of fictional counting, e.g. the psychometric measurement of human behaviour, may do likewise. All statistics like all language is fictional, in the sense described above. But Statistics, likewise language, is not an error or a (damn) lie by “nature”, or by design.

To elaborate on the issue of reductionism in terms of the relationship between levels/scales of language and psychometrics, Lantolf and Frawley maintain that

the lack of agreement among psychometricians on the number of levels to include in a proficiency hierarchy is further indication of the primacy granted to psychometric principles to the detriment of a clear understanding of the concept under investigation…What is missing from all of the scales, as far as we can determine, is sound justification that proficiency is indeed scalable in the manner assumed.

Spolsky is a firm believer in levels, which Lantolf and Frawley associate with psychometrists. But as is well known, Spolsky (1978, 1985) holds psychometrics in contempt: he refers to psychometrists as “hocus pocus scientists in the fullest sense of the word” (Spolsky, 1985:33-34). Accordingly, it is possible to reject psychometrics and believe that “items are added one at a time” (Spolsky, 1989:61), which is an “atomistic approach to language” (Alderson, 1981a: 47). But this does not mean that such an approach is, as Rutherford contends (1987:36-37), “static” and “mechanistic”. (See Alderson’s [1981a] criticism of Morrow [1981] for a succinct account of the false dichotomy between “atomistic” and “holistic” approaches to language testing.

Further, constructs; i.e. what (we think) is going in each individual’s invisible mind, can be scientifically inferred and described only when one has some idea of what is going on in many individual minds, i.e. what is going on in a group. In other words, it should be recognised that in some ways norm-referenced and criterion-referenced tests are mere abstractions if separated from each other. Rowntree (1977:185) explains:

Consider a test whose results we are to interpret by comparison with criteria. To do so we must already have decided on a standard of performance and we will regard students who attain it as being significantly different from those who do not…The question is:

How do we establish the criterion level? What is to count as the standard? Naturally, we can’t wait to see how students actually do and base our criterion on the average performance of the present group: this would be to go over into blatant norm-referencing. So suppose we base our criterion on what seems reasonable in the light of past experience? Naturally, if the criterion is to be reasonable, this experience must be of similar groups of students in the past. Knowing what has been achieved in the past will help us avoid setting the criteria inordinately high or low. But isn’t this very close to norm-referencing? It would even be closer if we were to base the criterion not just on that of previous students in general. (Emphasis added).

The main problem in human evaluation is how to assess human abilities in an individual (i.e. authentic), real way. One is conscious of the danger that:

group statistics may falsify the facts of individual speech, since individuals having a given phenomenon always present or always absent are lost in group statistics among the hordes where uses of the phenomenon more obviously reflect the Great Bell Curve in the sky.

(Bailey, 1976:97-98; cited in Nicholas & Meisel, 1983:82. Nicholas and Meisel’s context is second language acquisition)

The “Great Bell Curve” may be a pie in the sky, but as many (including mathematicians and statisticians) are aware, the Great Bell Curve in the sky is also very much down to the cosmic offshoot we call earth.

In conclusion, the problem is not only how to assess individual people, but how to assess “individual” criteria (grammar, content, and so forth). As Rowntree argues above, the individual or the group in isolation is an abstraction from social reality. Criteria – as Lantolf and Frawley, and most psychometrists as well, would agree – are abstractions when separated from the functional language of life. But if life is not reduced to criteria, knowledge – and the writing of articles (on such topics) – is not possible:

[W]hat kind of explanation will satisfy us if we wonder how a complicated machine, or living body, works?… If we wish to understand how a machine or living body works, we look at its component parts and ask how they interact with each other. If there is a complex thing that we do not yet understand, we can come to understand it in terms of simpler parts that we do already understand.

(Dawkins, 1987:11)


ALDERSON, J.C. 1981a. Reaction to the Morrow paper. In: Alderson, J.C. & Hughes, A. Issues in language testing: ELT Documents III. The British Council.

ALDERSON, J.C. 198 lb. Report of the discussion on general language proficiency. In: Alderson, J.C. & Hughes, A. Issues in language testing: ELT Documents III The British Council.

BAILEY, C.J. 1976. The state of no-state linguistics. Annual review of anthropology, 5:93-106.

BYRNES, H. & CANALE, M. (EDS.). 1987. Defining and developing proficiency: Guidelines, implementations and concepts. Lincolnwood (Chicago), Illinois: National Textbook Company. In conjunction with the American Council on the Teaching of Foreign Languages

DAWKINS, R.1987. The blind watchmaker. New York: W.W. Norton & Company.

DURAN, R.P.1984. Some implications of communicative competence research for integrative proficiency testing. In: Rivera, C. (ed.). Communicative competence approaches to language proficiency assessment: Research and application. Clevedon, England: Multilingual Matters.

GOLDMANN, L. 1970. Structure: Human reality and methodological concept. In: Macksey, R. & Donato, E. The structuralist controversy. Baltimore: The John Hopkins University Press.

HIPLE, D.V. 1987. A progress report on the ACTFL proficiency Guidelines, 1982-1986. In:

Bymes, H & Canale, M. (eds.). Defining and developing proficiency:

Guidelines, implementations and concepts. Lincolnwood (Chicago), Illinois:

National Textbook Company. In conjunction with the American Council on the

Teaching of Foreign Languages.

JEFFERY, C.D. 1990. The case for grammar: Opening it wider. South African Journal of Higher Education (SAJHE), Special edition.

JERISON, H.J. (ED.). 1986. Evolutionary biology of intelligence: The nature of the problem’ in Jerison, H.J., and Jerison, I. (ed.). Intelligence and evolutionary biology. New York: Springer-Verlag.

KRASHEN, S. 1981. Second language acquisition and second language learning. Oxford:

Pergamon Press.

LANTOLF, J.P. & FRAWLEY, W. 1985. Oral-proficiency testing: A critical analysis. Modern Language Journal, 69(4):337-45.

LANTOLF, J.P. & FRAWLEY, W. 1988. Proficiency: Understanding the construct. Studies in Second Language Acquisition (SLLA), 10(2): 181-195.

LEWONTIN, R.C., ROSE, S. & KAMIN, L.J. 1984. Not in our genes: Biology, ideology and human nature. New York: Pantheon.

MEISEL, J.M. & CLAHSEN & PIENEMANN, M. 1981. On determining developmental stages in second language acquisition. Studies in Second Language Acquisition,


MORRIS SEY, M.D. 1983. Toward a grammar of learner’s errors. International Review of Applied Linguistics, 21(3): 193:207.

MORROW, K. 1981. Communicative language testing: Revolution or evolution. In: Alderson, 3 (ed.). Issues in language testing. ELT Documents, The British Council.

NICHOLAS, H. & MEISEL, J.M. 1983. Second language acquisition: The state of the art. In: Felix, S.W. & Wode, H. (eds.). Language development at the crossroads:Papers from the Interdisciplinary Conference on Language Acquisition at Passau. Tubingen: Gunter Narr Verlag.

NUNAN, D.1988. Syllabus design. London: Oxford University Press.

OCHSNER, R. 1979. A poetics of second-language learning. Language Learning, 29(1):53-80.

OLLER, J.W.JR. 1995. Adding abstract to formal and content schemata: Results of recent work in Peircean semiotics. Applied Linguistics, 1 6(3):274-306.

OMAGGIO,A.C.1986. Teaching language in context: Proficiency-orientated instruction. Boston, Massachusetts: Henle & Henle.

POLITZER, R.L. & MCGROARTY, M. 1983. A discrete-point test of communicative competence. International Review of Applied Linguistics, 21(3): 179-191.

ROWNTREE, D. 1977. Assessing students: How shall we know them. London: Harper & Row, Publishers.

RUTHERFORD, W.E. 1987. Second language grammar: Learning and teaching. London:


SAVIGNON,S.J. 1985. Evaluation of communicative competence: The ACTFL Provisional Proficiency Guidelines. Modern Language Journal, 69 (2):129-134.

SPOLSKY, B.1978. Approaches to language testing. In: Spolsky, B (ed.). Advances in Language Testing Series, 2. Arlington, Virginia. Center for Applied Linguistics.

SPOLSKY, B.1985. The limits of authenticity in language testing. Language Testing, 2:31-40.

SPOLSKY, B.1989. Conditions for second language learning. Oxford: Oxford University Press.

WIDDOWSON, H.G. 1968. The teaching of English through science. In: Dakin, J., Tiffin, B. & Widdowson, H.G. Language in education. London: Oxford University Press.

WIDDOWSON, H.G. 1978. Teaching language as communication. Oxford: Oxford University Press.

WIDDOWSON, H.G. 1979. Explorations in applied linguistics. Oxford: Oxford University Press.

WIDDOWSON, H.G. 1983. New starts and different kinds of failure. In: Freedman, A., Pringle, I. & Yalden, J. (eds.). Learning to write: First language/Second language. London: Longman.

WIDDOWSON, H.G. 1989. Knowledge of language and ability of use. Applied Linguistics, 10(2):128-137.

WIDDOWSON, H.G. 1990. Aspects of language teaching. Oxford: Oxford University Press.

WIDDOWSON, H.6. 1992. Communication, community and the problem of appropriate use. In: Alatis, J.B. Georgetown University Round Table on Languages and Linguistics. Washington, D.C.: Georgetown University Press.

WOODFORD, P.1979. Foreign language testing background. In: President’s Commission of Foreign Language and International Studies: Background Papers and Studies. Washington, D.C.: Government Printing office.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: