Cloze Tests as Predictors of Global Language Proficiency: A Statistical Analysis

South African Journal of Linguistics, 1998, 16 (1), 7-15.

Author: Raphael Gamaroff


1.   Introduction

2.   Literature review of the cloze procedure as a test of  reading

3.   Closure, and deletion procedures in cloze

4.   Quantitative and qualitative methods

4.1 The limitations of error analysis

4.2 Quantitative methods in assessing levels of proficiency

5.   Method

5.1 Subjects

5.2 Instruments

6.   Results

7.   Discussion

8.   Bibliography


The usefulness of the cloze test is examined for assessing levels of language proficiency. The methodology involves a statistical analysis of levels of proficiency using the group-differences method of construct validity. Pienaar’s (1984) cloze tests are used to assess the level of Grade 7 pupils, who represent a wide range of English proficiency. It is argued that although qualitative analysis is important, the role of statistical analysis is crucial in understanding the construct of language proficiency.

1. Introduction

The educational context of this article is Mmabatho High School in the North West Province of South Africa, where I spent over seven years (January 1980 to April 1987) as a teacher of second languages (English and French) and researcher in the learning of English as a second language. In January 1987, I administered, in collaboration with the Grade 7 teachers, a battery of English proficiency tests to find out the level of English proficiency of entrants to Grade 7 at the School, where English was used as the medium of learning and instruction. The battery consisted of two essay tests, two dictation tests (Gamaroff, forthcoming), two cloze tests, an error recognition test and a mixed grammar test. The cloze tests discussed in this article are part of this test battery. Pienaar’s (1984) “reading for meaning” cloze tests were used.

The methodology consists of a statistical analysis of levels of proficiency, where the important of statistics in the assessment of language proficiency is emphasised.

2. Literature review of the cloze procedure as a test of reading

Cloze tests are deceptively simple devices that have been constructed in so many ways for so many purposes that an overview of the entire scope of the literature on the subject is challenging to the imagination not to mention the memory.

(Oller, 1973:106)

Since 1973 the literature on cloze has more than doubled, adding even more challenges to the imagination if not – thanks to the printed word – to the memory.

The aim of a cloze test is to evaluate (1) readability and (2) reading comprehension. The origin of the cloze procedure is attributed to Taylor (1953), who used it as a tool for testing readability. Of all the formulas of readability that have been devised, cloze tests have been shown, according to Geyer (1968), Weintraub (1968) and Oller (1973:106), to be the best indicators of readability. It is also regarded as a valid test of reading comprehension. Oller (1973:106) cites Bormuth (1969:265) who found a multiple correlation coefficient of .93 between cloze tests and other linguistic variables that Bormuth used to assess the difficulty of several prose passages. Bormuth (1969:265) maintains that cloze tests “measure skills closely related or identical to those measured by conventional multiple choice reading comprehension tests.”

Many standardised reading tests use cloze tests, e.g. the Stanford Proficiency Reading Test. Johnson and Kin-Lin (1981:282) believe that cloze is more efficient and reliable than reading comprehension, because it is easier to evaluate and does not, as in many reading comprehension tests, depend on long written answers for evaluation. (But it is also possible to use multiple choice reading tests; see Bormuth [1969] in the previous paragraph). Johnson and Kin-Lin’s implication is that although cloze and reading comprehension are different methods of testing, they both tap reading processes. Anderson (1976:1), however, maintains that as there is no consensus on what reading tests actually measure, all that can be said about a reading test is that it measures reading ability. On the contrary, far more can be said about reading: notions associated with reading are “redundancy utilization” (Weaver & Kingston, 1963), “expectancies about syntax and semantics” (Goodman, 1969:82) and “grammar of expectancy” (Oller, 1973:113). All these terms connote a similar process. This process involves the “pragmatic mapping” of linguistic structures into extralinguistic context (Oller, 1979:61). This mapping ability subsumes global comprehension of a passage, inferential ability, perception of causal relationships and deducing meaning of words from contexts (Schank, 1982:61). According to Bachman (1982:61)

`[t]here is now a considerable body of research providing sound evidence for the predictive validity of cloze test scores. Cloze tests have been found to be highly correlated with virtually every other type of language test, and with tests of nearly every language skill or component.’

Clarke (1983), in support of Bachman, is cautiously optimistic that the cloze procedure has a good future in reading research. Alderson (1979:225) who is less optimistic, maintains that

individual cloze tests vary greatly as measures of EFL proficiency. Insofar as it is possible to generalise, however, the results show that cloze in general relates more to tests of grammar and vocabulary (ELBA tests 5 and 6) than to tests of reading comprehension (ELBA test 7).

(The ELBA [English Language Battery] originates from the University of Edinburgh ([Ingram, 1964, 1973]).

Johnson & Kin-Lin (1981) and Oller (1979), contrary to Alderson, found that a great variety of cloze tests correlates highly with reading tests.

Alderson (1979) also believes, as does Hughes (1981) and Porter (1978), that individual cloze tests produce different results, and that each cloze test “needs to be validated in its own right and modified accordingly” (Alderson, 1979:226). Such a view is contrary to the view of Johnson & Kin-Lin (1981) and Oller (1979) mentioned in the previous paragraph that there is a high correlation between all kinds of cloze tests, indeed between cloze tests, dictation tests, essay tests and “low order” (Alderson, 1979) grammar tests, which indicates that cloze is a valid test of global, or general language proficiency – (see also Brown, 1983; Fotos, 1991; Hale et al., 1984; Oller, 1973; Stubbs & Tucker, 1974). However, research keeps bringing to light examples that show the difficulty involved in establishing the validity of cloze tests as a valid test of global proficiency, for example, the effect of text difficulty, or content on cloze scores (Alderson, 1979; Piper and McEachern, 1988) and on reading proficiency, specifically in the English for Special Purposes situation (Alderson & Clapham, 1992). With regard to cultural content in cloze testing, Chihara, Sakurai and Oller (1989) found in their English cloze tests for Japanese speakers that minor adjustments such as changing names from Joe or Nicholas to Japanese names produced a gain of six per cent over tests that had not been modified (Oller, 1995).

In spite of these problems, the evidence is strong that cloze is a valid and reliable test of global proficiency (Oller & Jonz, 1994; Walker, Rattanavich & Oller, 1992). I shall return to the validity of the cloze test as an indicator of global proficiency in the discussion of the results at the end of the article.

3. Closure, and deletion procedures in cloze

Closure is a pivotal concept in cloze theory. Closure does not merely mean filling in items in a cloze, but filling them in a way that reveals the sensitivity to intersentential context, which measures “higher-order skills” (Alderson, 1979:225). A cloze test that lacks sufficient closure would not be regarded as a good cloze test. According to Alderson (1979:225) “the cloze” is sentence-bound. Alderson (1979:225) states that

`one must ask whether the cloze is capable of measuring higher-order skills. The finding in Alderson (1978) that cloze seems to be based on a small amount of context, on average, suggests that the cloze is sentence – or indeed clause – bound, in which case one would expect a cloze test to be capable, of measuring, not higher-order skills, but rather much low-order skills…as a test, the cloze is largely confined to the immediate environment of a blank.’

This means that there is no evidence that increases in context make it easier to complete items successfully. Oller (1976:354) maintains, contrary to Alderson, that subjects “scored higher on cloze items embedded in longer contexts than on the same items embedded in shorter segments of prose”. Oller used five different cloze passages and obtained similar results on all of them.

With regard to methods of deletion, Jacobs (1988:47) lists two basic methods of deletion: fixed deletion and rational deletion. In the fixed deletion method, every nth word is deleted; which may range between every fifth word – which is believed to be the smallest gap permissible without making the recognition of context too difficult – and every ninth word. The rational deletion method is not fixed but is based on “selective” deletion (Ingram, 1985:241). Pienaar’s (1984) “reading” tests, which are used in this article, are rational deletion cloze tests.

Alderson (1980:59-60) proposes that the rational deletion procedure should not be referred to as a “cloze” but as a “gap-filling” procedure. Such a proposal has been accepted by some researchers, e.g. Weir (1993:81), but not accepted by others, e.g. Bachman’s (1985) “Performance on cloze tests with fixed-ratio and rational deletions”, Maclean’s (1984) “Using rational cloze for diagnostic testing in L1 and L2 reading” and Markham’s (1985) “The rational deletion cloze and global comprehension in German.” There is nothing wrong with the proposal that the rational-deletion procedure be called a gap-filling test, if it remains nothing more than that – a proposal.

Alderson (1979:226) suggests that what he calls cloze tests (namely, every nth word deletion) should be abandoned in favour of “the rational selection of deletions, based upon a theory of the nature of language and language processing”. Thus, although Alderson proposes that the rational selection of items, his “gap-filling” should not be called a cloze procedure, he still favours “gap-filling” tests over “cloze” tests. This superiority of the rational deletion method is supported by a substantial body of research, e.g. Bachman (1982) and Clarke (1979). However, it should be kept in mind that language hangs together, and thus the every-nth-word cloze test is also, in my view, a good test of global proficiency. In other words, whether one uses “fixed” deletions or “rational” deletions, both methods test global proficiency.

Having considered the arguments for the validity of the cloze test as a test of reading, it seems that cloze tests are valid tests of reading strategies, i.e. they can test long-range contextual constraints. One must keep in mind, however, that deletion rates, ways of scoring, e.g. acceptable words or exact words, and types of passages chosen in terms of background knowledge and of discourse devices, may influence the way reading strategies are manifested. But it is debatable whether one should make to much of these differences.

4. Qualitative and quantitative methods

Before dealing with the method of the investigation, it would be useful to say something about construct validity and the relationship between quantitative methods and qualitative methods in language proficiency testing. Quantitative methods deal with data that involve statistics, while qualitative measurement (one doesn’t really “measure” qualitative data, but analyses them) deals with data that do not involve statistics, e.g. an error analysis.

4.1 The limitations of error analysis

Without some knowledge of the subject’s ability and previous output – revealed through a longitudinal study of different outputs – a valid interpretation of errors is difficult to achieve. It is usually less difficult to infer processes from the production of grammatical errors than processes from the production of lexical items, where there exists no traditional corpus of errors. On a few occasions the subjects gave the same wrong answer to a cloze item, but usually they gave different wrong answers. When there are similar wrong answers, it may be easier to find a principled explanation. However, when there is a variety of wrong answers to the same item, I don’t think that much is to be gained by doing a “linguistic” analysis, owing to the mega-encyclopaedic range of possible interpretations. Though, testers may gain some insight may from a test taker’s self-editing or interviews with individual test takers after the test. However, owing to time constraints in the testing/teaching situation, it is often not possible to hold interviews. In my testing situation at Mmabatho High School, it would have required at least a half an hour with each subject on each cloze passage. Further, to ask 12-year old L2 English learners – this would be true of many L1 English speakers as well – to explain in a second language, or even in their mother tongue, the information-processing strategies they used in a test or in any other kind of learning behaviour is fraught with problems. So, even if one could interview test-takers after a test – whether in the second language or the mother tongue – one can never be sure whether they have understood the mental processes involved. The interpretation/ evaluation problem for raters is not only distinguishing between the processes and products of the test-taker, but between the rater’s own processes and products. The process-product distinction is far more useful in what it should be saying about where process and product meet than what it says about where they separate (Besner, 1985:9).

A pitfall of error analysis is that the more satisfying the explanation, the foggier the idea may be of what is going on in the test taker’s head. Thus, in an error analysis it is indeed possible to label the error in purely linguistic terms, but the more important diagnostic issue of why specific errors are committed remains largely a mystery. Raters are like inquisitive insects wandering around in a “gigantic multi-dimensional cobweb” in which every item requiring an interpretation is attached to a host of others (Aitchison, 1987:72).

4.2 Quantitative methods in assessing levels of proficiency

Although qualitative methods in language testing are useful, qualitative analysis without quantitative measurement (statistics, or psychometrics) would be of limited value (Gamaroff, 1996, 1997a).

According to Nunan (1992:20), there exists a dichotomy between objective-quantitative and subjective-qualitative research. This dichotomy is understandable owing to the danger of the individual getting lost in the thicket of the group, or norm. However, the psychometrist, whose business is norm-referenced testing, is not a “hocus-pocus” scientist, as Spolsky (1985:33-34) thinks, because any interpretation of test results by comparison with criteria must presuppose a criterion level. And the only way to establish what learners/test takers do is to base the criterion on the average performance of a group (Rowntree, 1977:185). Without a comparison of levels of proficiency, it is not possible to establish the construct validity of a test. That is why the group-differences approach is so important in construct validation. This approach is now explained:

The aim of testing is to discern levels of ability. If one uses reading ability as an example of a construct, one would hypothesise that people with a high level of this receptive (but not at all passive) ability would have a good command of sentence structure, cohesion and coherence; while people with a low level of this ability would have a poor command of these. Tests are then administered, e.g. cloze tests, and if it is found that there is a significant difference between a group of high achievers and a group of low achievers, this would be valid evidence for the existence of the construct. In general, second language learners are relatively less competent than mother tongue speakers. If a test fails to discriminate between low-ability and high-ability learners, there are three possible reasons for this:

1. The test may be too easy or too difficult. 2. The theory undergirding the construct is faulty. 3. The test has been inaccurately administered and/or scored, that is, it is unreliable.

5. Method

5.1 Subjects

The sample consists of 80 subjects, consisting of the following “ethnic” groups as shown in Table 1. For the purposes of my statistical analysis, I use “L1” and “L2” to distinguish between groups in terms of whether they take English as a first-language course subject or as a second-language course subject at the School, and not in terms of whether they have English as their mother tongue, or main language, or not, which are the usual meanings of L1 and L2. (When I do use L1 and L2 in their usual meanings, I shall make it clear that I am doing so).

Table 1. Detailed Composition of the Sample

Close Table 1

The vast majority of the L1 group originated from Connie Minchin Primary School, Mmabatho, which was the Main Feeder School for Mmabatho High School during 1980 to 1990. Some of the Tswanas in the L1 group originated from ex-DET (Department of Education and Training) Schools. The L2 group of 38 subjects originated from 29 different Schools in the North West Province; thus, from each of these schools there were, in general, only one entrant to Mmabatho High School.

Entrants decided themselves whether they belonged to the L1 and L2 group, i.e. whether they wanted to take English as a first or second language subject. I shall argue that classifications of such a nature should be based on valid and reliable test scores and not the classifications of the entrants themselves.

In South Africa, it is often not clear which individuals or groups use their mother tongue as their main language. There are two possible reasons for this: 1. several languages may be spoken at home, either because either one both parents speak more than one language, the mother tongue of one of the parents, usually the more powerful parent, begins to predominate at the age of about four or five years, which is often the father! tongue, and 2. the uprooting, caused by adverse social and economic circumstances. In such circumstances children may be removed from their mother tongue environment and placed with other families that speak a different language. For example, a seven-year old Xhosa child from the Eastern Cape might be placed with a Tswana family in the North West Province. Tswana then becomes the replacement language. It is possible that such a pupil might have only limited proficiency in both Tswana and English. The uprooting may have been not only among black families but among Coloured and Indian families as well. I shall deal with the replacement language issue in the discussion of the results.

5.2 Instruments

Two cloze tests from Pienaar’s (1984) pilot survey “Reading for meaning” are used. These tests have already been used in many schools in the North West Province and have produced a solid body of results. Many cloze tests consist of 50 deletions, because this number is thought to ensure high reliability and validity. Pienaar’s cloze tests each consist of only 10 items. I shall examine whether tests with so few deletions can be regarded as valid and reliable.

The question was whether to use the same tests for these two different language groups (L1 and L2) but give each of these groups its own norms. I decided against this because when the same syllabus (except for the language subjects) is used by L1 and L2 pupils, as was the case at Mmabatho High School, both groups have to contend with the same academic demands and content; and it is the effect that low scores on these cloze tests have on academic achievement that Pienaar) and I were ultimately concerned with.

In a review of Pienaar (1984), Johanson (1988:27) refers to the “shocking” low reading levels in many North West Province (ex-Bophuthatswana) schools revealed by Pienaar’s survey. Pienaar’s major finding was that 95% of pupils (Grade 3 to Grade 12) in the North West Province were “at risk”, i.e. they couldn’t cope with the academic reading demands made on them (See also Macdonald, 1990a, 1990b).

Pienaar’s (1984) tests comprise five graded levels – “Steps” 1 to 5, where each Step consists of four short cloze passages (Form A to Form D) with 10 blanks in each passage (Pienaar, 1984:41):

Step 1 corresponds to Grades 3 and 4 (Stds 1 and 2) for English first language and to Grades 5 to 7 for English second language. (Pienaar is using the term first language (L1) in the usual way, i.e. a mother tongue or a language a person knows best).

Step 2 corresponds to Grades 5 and 6 for first language and to Grades 7 to 8 for second language.

Step 3 corresponds to Grades 7 and 8 for first language and to Grades 9 to 11 for second language.

Step 4 corresponds to Grades 9 and 10 for first language and to Grades 11 and 12 for second language.

Step 5 corresponds to Grades 11 and 12 for first language and to Grade 12 + for second language.

If one Step proves too easy or too difficult for a specific pupil, a higher or a lower Step could be administered. For example, if Step 2 is too difficult, the pupil can be tested on Step 1. In this way it is possible to establish the level of English proficiency for each individual pupil. It must be stressed that the purpose of Pienaar’s cloze tests is to serve as predictors of general academic achievement and English achievement.

As shown above, Pienaar built into his tests an adjustment that distinguishes between L1 and L2 pupils; e.g. Step 2 is meant for Grades 5 and 6 L1 pupils and for Grades 7 to 9 L2 pupils.

It is only after the test has been performed on the “test-bench” (Pienaar, 1984:5) that it is possible to decide whether the test is too easy or too difficult (see also Cziko, 1982:368). If there are L1 and L2 subjects in the same sample, as is the case in this investigation, one might need to consider whether the norms of the L1 and the L2 groups should be separated or interlinked and how to classify precisely the L1 and L2 subjects used for the creation of norms (Baker, 1988:399). At Mmabatho High School, entrants decided themselves whether they wanted to take English first language as a course subject (designated as L1 in this article) or English second language as a course subject (designated as L2 in this article). The point of the proficiency tests, e.g. the cloze test, was to compare the former kind of classification with the test score classification.

According to Pienaar (1984), a perfect score on a cloze test indicates that the pupil has fully mastered that particular level. A score of 50% would indicate that the pupil is not ready for the next stage. Pienaar’s (1984) view is that pupils are expected to do well before they are permitted to move on to the next stage, e.g. a pupil with 50% for a Grade 7 cloze test should be in a lower grade. Recall that Pienaar is claiming that his tests are valid predictors of academic achievement.

Pienaar (1984:41) maintains that L2 pupils, i.e. ESL pupils, are generally two to three years behind English L1 pupils in the acquisition of English proficiency, and that there is often also a greater age range in the English second language classes, especially in the rural areas.

The tests were standardised in 1982 on 1068 final year JSTC (Junior Secondary Teacher’s Certificate) and PTC (Primary Teacher’s Certificate) students from nine colleges affiliated to the University of the Transkei. These standardised results became the table of norms for the tests (Pienaar, 1984:9). Below are the weighted mean scores achieved by the students of the nine colleges (Pienaar, 1984:10):

Step 1     Step 2     Step 3      Step 4     Step 5

Weighted means:          67%          53%        37%        31%         24%

Most of the colleges performed similarly on all five Steps. These results confirmed the gradient of the difficulty of the various steps.

During 1983 a major part of the test instrument was administered to a smaller group of college students selected from the original large group. No statistically significant difference between the scores of the two administrations was found, which confirmed the test-retest reliability of the instrument (Pienaar, 1984:9).

The tests underwent ongoing item analysis and refinement. By the time the final version was submitted to school pupils in the Mmabatho/Mafikeng area in 1984, 30% of the items had been revised. As a result of continuous item analysis, a further 18% of the items were revised (Pienaar, 1984:9).

An important point is that these aforementioned results claim to represent the reading ability of college students, who are supposed to be more proficient in English than school pupils. Final year student teachers only obtained a score of between 40% and 60% on Step 2 – see weighted scores above. (Step 2 is used in this investigation for Grade 7 pupils). These low scores indicate that the reading level of the student teachers, who were to start teaching the following year, was disturbingly no higher than the level of many of the pupils they would eventually teach.

In the test battery I used two tests – Form B and Form D – of Step 2. (Pienaar used four tests per Step). I shall show that two tests are sufficient to distinguish levels of proficiency. The two tests are presented below with the practice exercise:

Pienaar’s Practice exercise

(Pienaar does not provide the answers for this practice exercise. Possible answers are provided in brackets).

The 1 (rain) started falling from the sagging black 2 (clouds) towards evening. Soon it was falling in torrents. People driving home from work had to switch their 3 (headlights) on. Even then the 4 (cars, traffic) had to crawl through the lashing rain, while the lightning flashed and the 5 (thunder) roared.

Cloze Test 1:Form B Step 2 (Pienaar, 1984:59):

A cat called Tabitha

Tabitha was a well-bred Siamese lady who lived with a good family in a shiny white house on a hill overlooking the rest of the town. There were three children in the family, and they all loved Tabitha as much 1 she loved them. Each night she curled up contentedly on the eldest girl’s eiderdown, where she stayed until morning. She had the best food a cat could possibly have: fish, raw red mince, and steak. Then, when she was thirsty, and because she was a proper Siamese and did 2 like milk, she lapped water from a blue china saucer.

Sometimes her mistress put her on a Cat show, and there she would sit in her cage on 3 black padded paws like a queen, her face and tail neat and smooth, her black ears pointed forward and her blue 4 aglow.

It was on one of these cat shows that she showed her mettle. The Judge had taken her 5 of her cage to judge her when a large black puppy ran into the hall. All the cats were furious and snarled 6 spat from their cages. But Tabitha leapt out of the judge’s arms and, with arched 7 and fur erect, ran towards the enemy. The puppy 8 his tail and prepared to play. Tabitha growled, then, with blue eyes flashing, she sprang onto the puppy’s nose. Her 9 were razor-sharp, and the puppy yelped, shook her off, and dashed for the door. Tabitha then stalked back down the row of cages to where she had 10the judge. She sat down in front of him and started to preen her whiskers as if to say, “Wait a minute while I fix myself up again before you judge me.” She was quite a cat, was Tabitha!

Answers. (The words in round brackets are Pienaar’s suggested alternative answers. The words in square brackets are my suggested alternative answers):

1. as; 2. not; 3. her [four, soft]; 4. eyes (eye); 5. out; 6. and; 7. back (body); 8. wagged, twitched (waved, lifted); 9. claws (nails); 10. left (seen, met).

Cloze Test 2: Form D Step 2 (Pienaar, 1984:61):

A dog of my own

When I was ten all 1 wanted was a dog of my own. I yearned for a fluffy, fat, brown and white collie puppy. We already had two old dogs, but my best friend’s pet collie had 2 had seven fluffy, fat, brown and white puppies, and I longed for one with all my heart. However, my mother said no, so the seven puppies were all sold. I had horses, mice, chickens and guinea-pigs, and as my 3 said, I loved them all, but I wasn’t so keen on finding them food. Since she had five children to look after, it made here angry to 4 hungry animals calling, so she said crossly, “No more dogs.”

This didn’t stop me wanting one though, and I drew pictures of collie dogs, giving 5 all names, and left them lying around where she would find them. As it was 6 Christmas, I was sure that she would relent and give me a puppy for Christmas.

On Christmas morning I woke up very excited, 7 the soft little sleepy bundle that I wanted at the bottom of the bed wasn’t there. My mother had given me a book instead. I was so disappointed that I cried to myself, yet I tried not to 8 her how sad I was. But of course she noticed.

Soon after that my father went off to visit his brother and when he came back he brought me a puppy. Although it 9 a collie it was podgy and fluffy, and I loved him once. My mother saw that I looked after him properly and he grew up into a beautiful grey Alsation. We were good friends for eleven happy 10 before he went to join his friends in the Animals’ Happy Hunting Ground.


1. I; 2. just, recently; 3. mother (mummy, mum, mom); 4. hear; 5. them; 6. near (nearly, nearer, close to; 7. but, however (though); 8. show (tell); 9. wasn’t (was not); 10. years.

6. Results

The results involve the following statistical data: parallel reliability, means, standard deviations, z-scores, and frequency distributions.

The Pearson r correlation formula measures the parallel reliability between two separate, but equivalent, tests. Henning (1987:82) explains: “In general, the procedure for calculating reliability for using parallel forms is to administer the tests to the same persons at the same time and correlate the results as indicated in the following formula:

rtt = rA,B (Pearson r formula)

where rtt = reliability coefficient, and rA,B = the correlation of form A (in our case, Cloze Test 1) with form B (in our case, Cloze Test 2) of the test when administered to the same people at the same time. The pearson r for the two cloze tests in this investigation was .79.

The summary statistics of the L1 and L2 groups are reported in Table 2, followed by the frequency distributions of these two groups in Figures 1 and 2.

Table 2. Summary statistics of the L1 and L2 Groups

Cloze Table 2

Figure 1. Frequency Distribution of Test 1

Close Figure 1

Figure 2. Frequency Distribution of Test 2

Cloze Figure 2

A perfect score on a cloze test indicates that the pupil has fully mastered that particular level (Pienaar, 1984). Thus a score of 70% or lower would indicate that the pupil is not ready for the next stage. In the light of these comments, I examine the individual scores of theL1 Group in Table 3 because one would expect this group to get a score of at least 7, owing to the fact that they were taking English as a first language course subject.

Table 3. Cloze Scores ) in Ascending Order) of Ethnic Subgroups within the L1 Group

(C = Coloured; I = Indian; Tsw = Tswana; W = White; R = Replacement language

Cloze Table 3

I first discuss the scores of the Coloureds and Indians in Table 3. The data in Table 3 is summarised in Figure 3.

Figure 3. Summary data of Coloureds and Indians

An appreciable number of Coloureds and Indians use English as a replacement language, which is a language that becomes more dominant than the mother tongue, usually at an early age, but is seldom fully mastered. The situation with Coloured and Indian children is that many of them speak a bit of both Afrikaans or an Indian language, and English. A swing might occur towards English, and it might seem that Afrikaans or the Indian language has been replaced by English. What often happens instead is that basic English skills are never fully mastered: the result is a hybrid of English and Afrikaans, or English and an Indian language. A difficulty with replacement language pupils is that cognitive development could be inhibited when basic language skills have not been mastered at an early age in one language or, in some bilingual situations, in two languages. Now consider the scores of six points and below six in Figure 3. Of the 13 Indian and Coloured subjects, eight obtained a score of six and below. These were probably replacement language subjects, because it is unlikely that mother tongue subjects, as a rule, would obtain such low scores on a test that was pitched at the L2 level. Indeed the majority of the L1 Tswana subjects, who were mother tongue speakers of Tswana, obtained scores above six. Eight of the Tswana subjects who had a score of six or less changed from English first language as a course subject in Grade 7 (the year in which the cloze tests were given) to English second language as a course subject after Grade 7.

7. Discussion

The main issue in the classification of subjects/pupils is that one’s classification should not be a priori based (i.e. based on pupils’ or teachers’ preconceptions) but empirically based (Baker, 1988:408) on valid and reliablenorm-referenced tests. Such a solution may clash with the “outcomes” approach to testing, which tends to eschew comparisons of scores between individuals or between groups (HSRC, 1995a, 1995b).

Also, it might be argued that measuring the difference in means between groups is not useful because it apportions equivalent scores to each item, and accordingly does not take into account the relative level of difficulty of items as would be done in an item analysis. I suggest that the relative difficulty of items is not important in a language proficiency test, but is indeed in a diagnostic test, which has remediation as its ultimate purpose – as in Markham, 1984, mentioned in the literature review above. With regard to proficiency tests, one is concerned with overall, or general, or global proficiency that is specified for a certain level (e.g. elementary, intermediate and advanced) for specific people at a specific time and in a specific situation. These levels are determined by theory. Within each level there are difficult and easy items. To attain a specific level of proficiency one has to get most of the items right – the difficult and the easy ones. In sum, the different bits of language have to hang together.

Statistics will tell us a great deal about the level of proficiency of individuals and groups, while a diagnosis in the form of an error analysis will help in ascertaining items for remediation.

As the results show, cloze tests with only ten deletions distinguish clearly between levels of proficiency, which is an important factor in the construct validity of a test. And this brings me back to the validity of these cloze tests, or any cloze tests as an indicator of global, or general proficiency, as the “One Best Test” (Alderson, 1981). The “One Best Test” notion is closely related to the question whether language proficiency consists of a unitary factor analogous to a g factor in intelligence, or of a number of independent factors. The debate has gone on for at least three decades receiving prominence in the work of Carroll (1961, 1983) and Oller (1979, 1983, 1983a). This question is of immense practical importance in language testing. Alderson (1981:190) discusses the “One Best Test” argument and concludes that

regardless of the correlations, and quite apart from any consideration of the lack of face validity of the One Best Test, we must give testees a fair chance by giving them a variety of language tests, simply because one might be wrong: there might be no Best Test, or it might not have the one we chose to give, or there might not be one general proficiency factor, there may be several.

It would be very difficult to find the “One Best” or “perfect” test. The problem has to do not only with construct validity but also with face validity, because even if one managed to find a “perfect” or “One Best” test – one cloze test with 10 items! – it would not find general acceptance, owing to the fact that it would lack face validity, i.e. it would not look at all as if it could predict global proficiency. Decisions based on testing often affect people’s lives, therefore, one should use a variety of tests.

The ultimate interest in language proficiency lies its effect on academic achievement. There is little dispute that low English proficiency, where it is the medium of learning, goes together with educational failure (Gamaroff, 1995a). This does not necessarily mean, of course, that low English proficiency is the direct or only cause of educational failure. There is much evidence to indicate that low proficiency must be partly responsible for academic failure. But we are also aware that academic failure is much more than language failure, and indeed than second language failure, i.e. failure in using a main language or a second language as a medium of learning, for example English (Clayton, 1996; Gamaroff, 1995, 1996, 1997b; Winkler, 1997).

In this investigation, neither Pienaar nor myself have provided hard data that these cloze tests are indeed valid predictors of academic achievement: so why should one be persuaded that they are just because Pienaar says so? The absence of such data in Pienaar (1984), however, does not detract from the value of Pienaar’s cloze tests as a measure of language proficiency. After all, Pienaar’s (1984) monograph was only a pilot survey.

In unpublished research, I did a longitudinal study (Grade 7 to Grade 12) of these cloze tests as predictors of English achievement and general academic achievement (aggregate scores) with the same sample of subjects used in this article. The results of the longitudinal study are intended for publication in the near future.

8. Bibliography

Aitchison, J. 1987. Words in the mind: An introduction to the mental lexicon. Oxford: Basil Blackwell.

Alderson, J.C. 1978. A study of the cloze procedure with native and non-native speakers of English. Unpublished Ph.D. Dissertation. Edinburgh: University of Edinburgh.

Alderson, J.C. 1979. The cloze procedure and proficiency in English as a foreign language, TESOL Quarterly, 13:219-227.

Alderson, J.C. 1980. Native and nonnative speaker performance on cloze tests, Language Learning, 30(1):59-77.

Alderson, J.C. 1981a. Report of the discussion on general language proficiency. In: Alderson, J.C. & Hughes, A. Issues in language testing: ELT Documents III. The British Council.

Alderson, J.C. and Clapham,C. 1992. Applied linguistics and language testing. A case study of ELTS test. Applied Linguistics, 13(2):149-167.

Anderson, J. 1976. Psycholinguistic experiments in foreign language testing. Queensland: University of Queensland Press.

Bachman, L.F. 1982. The trait structure of cloze test scores, TESOL Quarterly, 16(1):61-70.

Bachman, L.F. 1985. Performance on cloze tests within fixed-ratio and rational deletions, TESOL Quarterly, 19(3).

Baker, C. 1988. Normative testing and bilingual populations. Journal of Multilingual and Multicultural Development, 9(5):399-409.

Besner, N. 1985. Process against product: A real opposition. English Quarterly, 18(3):9-16.

Bormuth, J. 1964. Mean word depth as a predictor of comprehension difficulty, California Journal of Educational Research, 15:226-231.

Brown, J.D. 1983. A closer look at cloze: Validity and reliability, Oller, J.W., Jr. (Ed.). Issues in language testing research. Rowley, Massachusetts: Newbury Publishers.

Carroll, J.B. 1961. Fundamental considerations in testing for English language proficiency of foreign language students. Washington, D.C: Center for Applied Linguistics.

Carroll, J.B. 1993. Human cognitive abilities: A survey of factor analytic studies. Cambridge. Cambridge University Press.

Chihara, T., Sakurai, T. & Oller, J.W. Jr. 1989. Background and culture as factors in EFL reading comprehension. Language Testing, 6(2):143-151.

Clark, J.L.D. 1983. Language testing: Past and current status – Directions for the future, Language Testing, 64(4):431-443.

Clayton, E. 1996. Is English really the culprit? Investigating the content versus language distinction. Per Linguam, 12(1):24-33.

Corder, S.P. 1981. Error analysis and interlanguage. Oxford: Oxford University Press.

Cziko, G.A. 1982. Improving the psychometric, criterion-referenced, and practical qualities of integrative testing, TESOL Quarterly, 16(3):367-379.

Fotos, S. 1991. The cloze test as an integrative measure of EFL proficiency: A substitute for essays on college entrance examinations. Language Learning, 41(3):313-336.

Gamaroff, R. 1995a. Affirmative action and academic merit. Forum, 1(1). Journal of the World Council of Curriculum Instruction, Region 2, Africa South of the Sahara, Lagos.

Gamaroff, R. 1995b. Solutions to academic failure: The cognitive and cultural realities of English as a medium of instruction among black learners. Per Linguam, 11(2):15-33.

Gamaroff, R. 1996. Is the (unreal) tail wagging the (real) dog?:

Understanding the construct of language proficiency. Per Linguam, 12(1):48-58.

Gamaroff, R. 1997a (Forthcoming). Paradigm lost, paradigm regained: Statistics in language testing. Journal of the South African Association of Language Teaching (SAALT).

Gamaroff, R. 1997b. Language as a deep semiotic system and fluid intelligence in language proficiency. South African Journal of Linguistics, 15(1):11-17.

Gamaroff, R. Forthcoming. Dictation as a test of communicative proficiency, International Review of Applied Linguistics.

Geyer, J.R. 1968. Cloze Procedure as a predictor of comprehension in secondary social studies materials. Olympia, Washington: State Board for Community College Education.

Goodman, K.S. 1969. Analysis of oral reading miscues: Applied psycholinguistics, Reading Research Quarterly, 5:9-30.

Hale, G.A., Stansfield, C.W. & Duran, R.P. 1984. TESOL Research Report 16. Princeton, New Jersey: Educational Testing Service.

Henning, A. 1987. A guide to language testing. Rowley, Massachusetts: Newbury House.

HSRC. 1995. Ways of seeing the National Qualifications Framework. Pretoria: Human Sciences Research Council.

HSRC. 1996. Language assessment and the National Qualifications Framework. Pretoria: Human Science Research Council Publishers.

Hughes, A. 1981. Conversational cloze as a measure of oral ability, English Language Teaching Journal,35(2):161-168.

Ingram, E. 1964.English Language Battery (ELBA). Edinburgh: Department of Linguistics, University of Edinburgh.

Ingram, E. 1973. English standards for foreign students, University of Edinburgh Bulletin, 9:4-5.

Ingram, E. 1985. Assessing proficiency: An overview on some aspects of testing, Hyltenstam, K. & Pienemann, M. Modelling and Assessing second language acquisition. Clevedon, Avon: Multilingual Matters Ltd.

Jacobs, B. 1988. Neurobiological differentiation of primary and secondary language acquisition, Studies in Second Language Acquisition, 33:247-52.

Jeffery, C.D. 1990. The case for grammar: Opening it wider, South African Journal of Higher Education, Special edition.

Johnson, F.C. & Kin-Lin, C.W.L. 1981. The interdependence of teaching, testing, and instructional materials, Read, J.A.S. (Ed.). Directions in language testing. Singapore: Singapore University Press.

Macdonald, C. A. 1990. Crossing the threshold into standard three in black education: The consolidated main report of the threshold project. Pretoria: Human Sciences Research Council.

Macdonald, C. A. 1990a. English language skills evaluation (A final report of the Threshold Project). Report Soling-17. Pretoria. Human Sciences Research Council.

Maclean, M. 1984. Using rational cloze for diagnostic testing in L1 and L2 reading, TESL Canada Journal, 2:53-63.

Markham, P. 1985. The rational deletion cloze and global comprehension in German, Language Learning,35:423-430.

Oller, J.W., Jr. 1973. Cloze tests of second language proficiency and what they measure, Language Learning, 23(1):105-118.

Oller, J.W., Jr. 1976. Cloze, discourse, and approximations to English, Burt, K. & Dulay, H.C. (Eds.). New directions in second language learning, teaching and bilingual education. TESOL: Washington, D.C.

Oller, J.W., Jr. 1979. Language tests at school. London: Longman.

Oller, J.W., Jr. 1983. A consensus for the 80s. In: Issues in language testing research. Rowley, Massachusetts: Newbury Publishers.

Oller, J.W., Jr. 1983a. “g”, “What is it? In: Hughes, A. and Porter, D. (Eds.). Current developments in language testing. London: Academic Press.

Oller, J.W., Jr. 1983b. Issues in language testing research. Rowley, Massachusetts: Newbury Publishers.

Oller, J.W. Jr. 1995. Adding abstract to formal and content schemata: Results of recent work in Peircean semiotics. Applied Linguistics, 16(3):273-306.

Oller, J.W. Jr. & Jonz, J. (Eds.). 1994. Cloze and coherence. Cranbury, N.J. Bucknell University Press.

Pienaar, P. 1984. Reading for meaning: A pilot survey of (silent) reading standards in Bophuthatswana. Mmabatho: Institute of Education, University of Bophuthatswana.

Piper, T. & McEachern, W.R. 1988. Content bias in cloze as a general language proficiency indicator. English Quarterly, 21(1):41-48.

Porter, D. 1978. Cloze procedure and equivalence, Language Learning, 28(2):333-41.

Schank, R.C. 1982. Reading and understanding: Teaching from the perspective of artificial intelligence. Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Spolsky, B. 1985. The limits of authenticity in language testing. Language Testing, 2:31-40.

Stubbs, J. & Tucker, G. 1974. The cloze test as a measure of English proficiency, Modern Language Journal, 58:239-241.

Taylor, W. 1953. Cloze procedure: A new tool for measuring readability, Journalism Quarterly, 30:414-438.

Walker, R., Rattanavich, S. & Oller, J.W. Jr. 1992. Teaching all the children to read. Buckingham, England: Open University Press.

Weaver, W.W. & Kingston, A.J. 1963. A factor analysis of the Cloze procedure and other measures of reading and language ability, Journal of Communication, 13:252-261.

Weintraub, S. 1968. The cloze procedure, The Reading Teacher, 6:21, 567, 569, 571, 607.

Weir, C.J. 1993.Understanding and developing language tests. London: Prentice Hall.

Winkler, G. 1997. The myth of the mother tongue: Evidence from Maryvale College, Johannesburg. South African Journal of Applied Language Studies, 5(1):29-39.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: