part of a larger collective activity, one which is deliberate, con
structed for a particular purpose. It involve the efforts of many
others in addition to the individual whose performance is 'in the
spotlight'.
This chapter presents a perspective on assessment which focuses
on the larger framing and social meaning of assessment. Such
a perspective has drawn on diverse fields including sociology,
THE S O C I A L CHARACTER O F LAN G U A G E T E S T S
67
political and cultural theory, and discourse analysis for its ana
lytic tools and concepts, together with an expanded notion of test
validity.
The institutional character of assessment
The individualized and individualizing focus of traditional
approaches described so far is really rather surprising when we
consider the inherently institutional character of assessment.
When test reforms are introduced within the educational system,
they are likely to figure prominently in the press and become
matters of public concern. This is because they impinge directly
on people's lives. When an assessment is made, it is not done by
someone acting in a private capacity, motivated by personal
curiosity about the other individual, but in an institutional role,
and serving institutional purposes. These will typically involve
the fulfilment of policy objectives in education and other areas of
social policy. And social practice raises questions of social
responsibility.
Assessment and social policy
Language tests have a long history of use as instruments of social
and cultural exclusion. One of the earliest recorded instances is
the shibboleth test, mentioned in the Old Testament. Following a
decisive military battle between two neighbouring ethnic groups,
members of the vanquished group attempted to escape by blend
ing in with their culturally and linguistically very similar victors.
The
two
groups spoke varieties of a
single language
aml
it
was
typically possible to distinguish between speakers of either vari
ety by the way they pronounced words beginning with a sibilant
sound. The victors pronounced such words with an [sh] sound,
the vanquished with the sound [s] . So the word 'shibboleth'
(meaning according to some authorities 'an ear of wheat', others
'a stream') was used as a single item language test by the victori
ous group in order to detect the enemy in their midst. Individuals
suspected of being members of the vanquished were asked to say
this word, and if they pronounced it 'sibboleth', they failed the
test. In this case, failure was fatal since the test-takers were imme
diately put to death. Poor performance on a test may have serious
consequences, though fortunately not usually as dire as this.
68
S U RVEY
Notice that the test here is a test of authenticity of identity,
rather than of proficiency; a single instance is enough to betray
the identity which the test aims to detect. A more recent instance
of a detection test is the proposal in the 19 6os, but never imple
mented, for a language test to be used by the Royal Canadian
Mounted Police to exclude homosexual recruits. Word lists
which included some items of homosexual slang (words such as
camp, cruise, fruit,
and
trade)
would be presented to recruits, and
the sweatiness of their palms (a sign of nervousness) would be
measured electrically. It was assumed that only homosexuals
familiar with the subculture in which these terms were used, with
secondary slang meanings, would recognize and respond to the
ambiguity of the terms. They would become nervous, sweat, and
be detected. In this test, a perfect score was zero !
More conventional proficiency tests have also been used for
purposes of exclusion. Prior to the Second World War the
Australian Government used a language test as part of their pol
icy to exclude immigrants other than those coming from the
British Isles. Those applying to immigrate could be administered
a dictation test in any language selected by the immigration offi
cer. If the person passed the test in English, then any one of a
range of other languages could be used until the candidate failed.
In one notorious case, a Hungarian Jewish refugee from Hitler's
persecutions applied for immigrant status. He was a polyglot and
passed the test in a number of languages before finally failing in
Gaelic, thereby being refused entry and thus facing a tragic fate in
Europe. The blatancy of such a practice is not readily replicated
elsewhere, but it illustrates the possibility that language tests can
form part of a politically and morally objectionable policy.
Assessment and educational policy
Assessment serves policy functions in educational contexts, too.
One example is in the area of vocational education and training
for adults. Most industrialized countries have, in recent years,
responded to the need for the upgrading of the workforce in the
face of rapid technological change by developing more flexible
policies for the recognition and certification of specific work
related skills, each of which may be termed a competency. National
competency frameworks, consisting of an ordered series of 'can
T H E S O C IAL CHARACTER OF LAN G U A G E T E S T S
69
do' statements describing levels of performance on relevant job
related tasks, have been adopted. Language and literacy compe
tency frameworks have been developed as part of these policies.
In international education, tests are used to control access to
educational opportunities. Typically, international students need
to meet a standard on a test of language for academic purposes
before they are admitted to the university of their choice. Is this
reasonable? Should access to educational opportunity be restricted
on the basis of a language test? If it is agreed that some assessment
of language ability is reasonable in this context, then questions
arise regarding the level of proficiency to be required, and how
this should be determined. Further, should the assessment of lan
guage proficiency be carried out within the context of perfor
mance on typical academic tasks? But then, does this not mean
that those who have had some experience of such tasks have the
advantage over those who do not? If this is so, then one might
question the fairness of such tasks as instruments for the testing of
language ability. One can also raise the question of how native
speakers might perform on such integrated tasks, and why, given
that they are admitted to the same courses of study, they should
not also be required to subject themselves to assessment.
The social responsibility of the language tester
The policies and practices discussed in the preceding two sections
throw up a host of questions about fairness, and about the policy
issues surrounding testing practice. They also raise the question
of the responsibilities of language testers. Recently, serious atten
tion has been given to these issues for the first time, an overdue
development, one might say, given the essentially institutional
character of testing.
Imagine the following situation involving the use of language
tests within immigration policy. You live in an English-speaking
country which accepts substantial numbers of new settlers each
year. The current immigration policy distinguishes between cate
gories of intending settlers. The claims of refugees are privileged
in various ways, as are those of family members of local citizens
(settled immigrants have the right to apply to bring into the coun
try parents who are living in the country of origin) . English lan
guage proficiency and knowledge of local cultural practices have
70
S U RVEY
not been a criterion in selection in such cases. A further category of
individuals with no prior connection to the country, and who are
not refugees, may also apply for immigration; but the selection
process for them is much tougher-approximately only one in ten
who apply is granted permission to settle. Selection criteria for this
category of applicants include educational level, type of work
expertise, age, and proficiency in English, among other things.
English language proficiency is currently assessed informally by an
immigration officer at the time of interview. The immigration
authorities approach you to be part of a team commissioned with
the development of a specific test for the purpose of more accu
rately determining the proficiency of intending immigrants in this
category. What ethical issues do you face?
On the one hand, the advent of the new test might appear to
promote fairness. Obviously, as judgements in the current infor
mal procedures are not made by trained language evaluators, and
no quality control procedures are in place, there are inconsisten
cies in standards, and hence unfairness to individuals. A carefully
constructed test, both more relevant in its content, and more reli
able in its decisions, appears on the face of it to be fairer for the
majority. On the other hand, the introduction of such an instru
ment raises worrying possibilities. Might not the authorities, once
it is in place, be tempted to use it on previously exempt categories,
for example refugees or family members ? Who will be in charge
of interpreting scores on the test? Who will set cut-scores for
'passing' and 'failing'? In response to your inquiries on this point,
you are informed that cut-scores will vary according to the
requirements of immigration policy, higher when there is political
pressure to restrict immigration numbers, lower when there is a
labour shortage and immigrant numbers are set to rise. The polit
ical nature of the test is revealed by such facts-where does that
leave you as a socially responsible test developer? Should you
refuse to get involved?
Such cases raise issues of the ethics of language testing practice,
which are becoming a matter of considerable current debate. We
can distinguish two views, both of which acknowledge the social
and political role of tests. One holds that language testing practice
can be made ethical, and stresses the individual responsibility of
testers to ensure that they are. The other sees tests as essentially
T H E S O C I A L C HARACTER O F LAN G U A G E T E S T S
7 I
sociopolitical constructs, which, since they are designed as instru
ments of power and control, must therefore be subjected to the
same kind of critique as are all other political structures in society.
We may refer to the first view as ethical language testing; the latter
is usually termed critical language testing.
Ethical language testing
Those who argue that language testing can be an ethical activity
take either a broader or more restricted view of the ethics of
testing. We can call the former the social responsibility view, the
latter the traditional view.
Those who advocate the position of socially responsible lan
guage testing reject the view that language testing is merely a
scientific and technical activity. They appeal to recent develop
ments in thinking about validity, especially to the notion of con
sequential validity. In general, this means that evaluation of a
test's validity needs to take into account the wanted and
unwanted consequences that follow from the introduction of the
test. Some take the view that consequential validity, like validity
of other kinds (as discussed in Chapter 5 ), is the responsibility of
the test developer and needs to be taken into account, not only by
anticipating possible consequences in test design, but also by
monitoring its effects in implementation.
Generally, this expanded sense of responsibility sees ethical
testing practice as involving test developers in taking responsibil
ity
for the effects of tests. There are three main areas of concern
here. One of these is accountability. This has to do with a sense of
responsibility to the people most immediately affected by the test,
principally the test-takers, but also those who will use the infor
mation it provides. The test (and hence the test developer) need to
be accountable to them. A second area relates to the influence that
testing has on teaching, the so-called washback effect. The third
involves a consideration of the effect of a test beyond the class
room, the ripples or waves it makes in the wider educational and
social world: what we can call the test impact.
Accountability
Ethical testing practice is seen as involving making tests account
able to test-takers. Test developers are typically more preoccupied
7 2
S U RV E Y
with satisfying the demands of those commissioning the test, and
with their own work of creating a workable test. Test-takers are
seldom represented on test development committees which super
vise the work of test development, and represent the interests of
stakeholders. Minimally, accountability would require test devel
opers to provide test-takers with complete information on what is
expected of them in the test. Such information is often provided in
the form of a test users' handbook or manual, which provides
information on the rationale for the test and its structure, general
information on its content and the format of items, and sample
items.
More substantially, test developers should be required to
demonstrate that the test content and format are relevant to can
didates, and that the testing practice is accountable to their needs
and interests. Too often, traditional testing procedures and for
mats may be preferred even in situations where they are no longer
relevant. For example, British examinations originally developed
for the British secondary school system are still used in Africa,
despite the inappropriateness of their content and format.
An aspect of accountability is the question of determining the
norms of language behaviour which will act as a reference point
in the assessment. This will include issues such as the appropriate
variety
of the language to be tested. In an era where no single vari
ety of English constitutes a norm everywhere, the question arises
of how much of the variation among English speakers it is appro
priate to include in a test.
Consider, for example, the TOEFL test, used primarily for selec
tion of international students to universities in the United States.
Given the diversity of varieties of English, both native and non
native, typically encountered in the academic environment there,
it might be argued that it is responsible to include examples of
those varieties in the test rather than to include only samples of
the standard variety.
Washback
The power of tests in determining the life chances of individuals
and in influencing the reputation of teachers and schools means
that they can have a strong influence on the curriculum. The effect
of tests on teaching and learning is known as test washback.
T H E S O C IA L CHARACTER O F LANGUAGE T E S T S 73
Ethical language testing practice, it is felt, should work to ensure
positive wash back from tests.
For example, it is sometimes argued that performance assess
ments have better washback than multiple choice test formats or
other individual item formats, such as doze, which focus on
isolated elements of knowledge or skill. As performance assess
ments required integration of knowledge and skills in perfor
mance on realistic tasks, preparation for such assessments will
presumably encourage teachers and students to spend time
engaged in performance of such tasks as part of the teaching. In
contrast, multiple choice format item tests of knowledge of gram
mar or vocabulary may inhibit communicative approaches to
learning and teaching.
Authorities responsible for assessment sometimes use assessment
reform to drive curriculum reform, believing that the assessment
can be designed to have positive washback on the curriculum.
However, research both on the presumed negative washback of
conservative test formats, and on the presumed positive wash
back of communicative assessment (assumed to be more progres
sive) has shown that washback is often rather unpredictable.
Whether or not the desired effect is achieved will depend on local
conditions in classrooms, the established traditions of teaching,
the immediate motivation of learners, and the frequently unpre
dictable ways in which classroom interactions develop. These can
only be esLablisheu after the event, posr hoc, on the basis of infor
mation collected once the reform has been introduced.
Test
impact
Tests can also have effects beyond the classroom. The wider effect
of tests on the community as a whole, including the school, is
referred to as test impact. For example, the existence of tests such
as TOEFL, used as gatekeeping mechanisms for international edu
cation, and administered to huge numbers of candidates all over
the world, has effects beyond the classroom, in terms of educa
tional policy and the allocation of resources to education. In cer
tain areas of the world, university selection is based directly on
performance in the assessments of the senior year of high school.
This has often led to the existence of tightly controlled formal
examinations, partly in order to make what tended to become a
74
S U RVEY
very competitive assessment as psychometrically reliable as possi
ble. However, in an era where most students are completing a
secondary education, such an assessment no longer meets the
needs of the majority of students. A curriculum and assessment
reform in favour of continuous assessment and the completion of
projects and assignments in such a case would have widespread
impact on families, universities, employers, and employment and
welfare services. In fact, in one such case, part of the impact of the
reform was to open the door to abuses of the assessment process
by wealthy families, who could afford to hire private tutors to
coach their children through the projects they had to complete in
order to gain the scores they needed to enter the university of their
choice. Test impact is likely to be complex and unpredictable.
Codes of professional ethics for language testers
In contrast to those advocating the direct social responsibility of
the tester, a more traditional approach involves limiting the social
responsibility of language testers to questions of the professional
ethics of their practice. In this view, the approach to the ethics of
language testing practice should be the same as that taken within
other areas of professional practice, such as medicine or law.
Professional bodies of language testers should formulate codes of
practice which will guide language testers in their work. The
emphasis is on good professional practice: that is, language
testers should in general rake responsibiliLy
for Lhe development
of quality language tests. The larger questions of the politics of
language testing fall not so much within the domain of the ethics
of language testing practice as such; instead they represent the
ethical questions that all citizens must face-for example, on
issues such as capital punishment, abortion and the like.
Those taking this view understand consequential validity as
concerning consequential impediments to the interpretability of
test scores. For example, in the case of the notorious Australian
dictation test discussed earlier, test developers were presumably
aware of the uses to which the test was to be put. But instead of
arguing that language testers have an ethical responsibility to
object to the policy behind the test in such a case, it may be suffi
cient (and arguably more effective) to oppose the test on the basis
of professional validity arguments. What is wrong with this test is
T H E S O C IA L CHARACTER OF LANGUAGE T E S T S
7 5
that there was only one acceptable inference possible from the
test: that the test-taker was unsuitable for acceptance into
Australia. Proficiency in the range of languages tested was not rel
evant to the question of the person's suitability for settlement in
Australia. The problem with the test, in this view, is that the test
construct is not meaningful or interpretable in this context. It is
not a valid test. The fact that it constitutes an offence against
social justice thus does not need to be addressed directly; rather,
the test is found wanting within an expanded theory of validity,
that is, one which includes consequential validity.
Critical language testing
A much more radical view of the social and political role of tests
is being formulated as part of the developing area known as critical
applied linguistics. This applies current social theory and critical
theory to issues within applied linguistics generally. Language test
ing, as a quintessentially institutional activity, is facing increasing
scrutiny from this perspective. The basic tenets of such a view are
that the principles and practices that have become established as
common sense or common knowledge are actually ideologically
loaded to favour those in power, and so need to be exposed as an
imposition on the powerless. In this view, there would be little
point in tinkering with existing institutional constructs, working
within the framework they determine. What is needed is a radical
reconstruction which changes the whole ideological foundations.
In this perspective the very concept of testing, of language or
anything else, gets redefined in socio-political terms. Critical
language testing is best understood as an intellectual project to
expose the role of tests in this exercise of power. For example, the
existence of language testing on a huge international scale-what
some have called industrialized language testing-is ripe for
critical analysis. There are hundreds of thousands of individual
administrations of the TOEFL test in any year, in a huge number
of countries; what are we to make of this phenomenon in critical
terms ?
From the perspective of critical language testing, the emphasis
in ethical language testing on the individual responsibility of the
language tester is misguided because it presupposes that this
would operate within the established institution of testing, and so
76
S U RV E Y
essentially accept the status quo and concede its legitimacy.
Critical language testing at its most radical is not reformist since
reform is a matter of modification not total replacement. At its
most radical indeed, it would not recognize testing as we know it
at all. Given this, it is perhaps unsurprising that language testers
themselves have found it difficult to articulate this critique, or
have interpreted it as implying the necessity for individual ethi
cally responsible behaviour on the part of testers. The critique, if
and when it comes, may emerge most forcefully from outside the
field. Given the disciplinary borders of knowledge and influence
in the field, however, any criticism from outside may be heard
only with difficulty by practitioners within.
Conclusion
In this chapter we have examined the institutional character of
tests and the implications of this for understanding the nature of
language testing as a social practice, and the responsibility of lan
guage testers. Language testing, like language itself, cannot ulti
mately be isolated from wider social and political implications. It
is perhaps not surprising after all that the field has only belatedly
grasped this fact, and even now is uncertain about the extent to
which it is able or willing to articulate a thorough critique of its
practices. This may best be left to those not involved in language
testing. Language testers themselves meanwhile stand to benefit
from a greater awareness of language testing as a social practice.
It may lead to a more responsible exercise of the power of tests,
and a more deeply questioning approach to the questions of test
score meaning which lie at the heart of the validity of language
tests.
THE S O C I A L CHARACTER O F LAN G U A G E T E S T S
77
New directions-and dilemmas?
We live in a time of contradictions. The speed and impressiveness
of technological advance suggest an era of great certainty and
confidence. Yet at the same time current social theories under
mine our certainties, and have engendered a profound ques
tioning of existing assumptions about the self and its social
construction. Aspects of these contradictory trends also define
important points of change in language testing. The applications
of technological innovations in language testing remain for the
most part rooted in traditional modernist assumptions about the
nature of performance and the possibilities of measurement of
language ability. It is assumed, for example, that there is such a
thing as 'ability' which is located in the mind of the candidate,
which is, as it were, projected directly in performance; that the
individual candidate is solely responsible for his/her performance
in the test; and that ability can be measured more or less objec
tively. But it is these very individualizing modernist assumptions
of testing practice which are now being challenged by new theo
ries of performance. Language testing is a field in crisis, one which
is masked by the impressive appearance of technological advance.
Computers and language testing
Rapid developments in computer technology have had a major
impact on test delivery. Already, many important national and
international language tests, including
TOEFL,
are moving to com
puter based testing (CBT).
Stimulus texts and prompts are pre
sented not in examination booklets but on the screen, with
candidates being required to key in their responses. The advent of
CBT
has not necessarily involved any change in the test content,
NEW D I R E C T I O N S - AN D D I LEMMAS ?
79
which may remain quite conservative in its assumptions, but
often simply represents a change in test method.
The proponents of computer based testing can point to a num
ber of advantages. First, scoring of fixed response items can be
done automatically, and the candidate can be given a score imme
diately. Second, the computer can deliver tests that are tailored to
the particular abilities of the candidate. It seems inefficient for all
candidates to take all the questions on a test; clearly some are so
easy for some candidates that they provide little information on
their abilities; others are too hard to be of use. It makes sense to
use the very limited time available for testing to focus on those
items that are just within, and just beyond a candidate's threshold
of ability.
Computer adaptive tests do just this. At the beginning of the
test, a small number of common items are presented to all candi
dates. Depending on how an individual candidate performs on
those items, he/she is subsequently presented only with items esti
mated to be within his/her likely ability range. The computer
updates its estimate of the candidate's ability after each response.
In this way, the test adapts itself to the candidate. Such tests
require the prior creation of an item bank, a large group of items
which have been thoroughly trialled, and whose likely difficulty
for candidates at given levels of ability has been estimated as pre
cisely as possible.
Items are drawn from the item bank in response to the perfor
mance of the candidate on each item, until a point where a stable
and precise estimate of the candidate's ability is achieved. In this
way each candidate will receive a test consisting of a possibly
unique combination of items from the bank, a test suited precisely
to the candidate's ability. The existence of large item banks makes
possible a third advantage of computer based testing. Tests can be
provided on demand, because so many item combinations are pos
sible that test security is not compromised. Computer adaptive tests
of grammar and vocabulary have long been available, but recently
similar tests of listening and reading skills have been developed.
The use of computers for the delivery of test materials raises
questions of validity, as we might expect. For example, different
levels of familiarity with computers will affect people's perfor
mance with them, and interaction with the computer may be a
8o
S U RV E Y
stressful experience for some. Attempts are usually made to
reduce the impact of prior experience by the provision of an
extensive tutorial on relevant skills as part of the test (that is,
before the test proper begins) . Nevertheless, the question about
the impact of computer delivery still remains.
Questions about the importance of different kinds of presenta
tion format are raised or exacerbated by the use of computers. In
a writing test, the written product will appear in typeface and will
not be handwritten; in a reading test, the text to be read will
appear on a screen, not on paper. Do raters react differentially to
printed versus handwritten texts? Is any inference we might draw
about a person's ability to read texts presented on computer
screens generalizable to that person's ability to read texts printed
on paper, and vice versa? In computerized tests of written compo
sition, composing processes are likely to be different, because of
word processing capacities available on the computer. Do such
differences in aspects of test method result in different conclu
sions about a candidate's ability? A complex programme of
research is needed to answer these questions.
The ability of computers to carry out various kinds of auto
matic processes on spoken or written texts is having an impact on
testing. These will include the ability to do rapid counts of the
number of tokens of individual words, to analyse the grammar of
sentences, to count pauses, to calculate the range of vocabulary,
and to analyse features of pronunciation. Already these automatic
measures of pronunciation or writing quality are being used in
place of a second human rating of performances, and have been
found to contribute as much to overall reliability as a human rating.
Of course, such computer operations have limitations. For example,
in the testing of speaking, they are bound to be better at acoustic
than auditory aspects of pronunciation, and cannot readily iden
tify intelligibility since this is a function of unpredictable contex
tual factors. Nevertheless, we can expect many further rapid
advances in these fields, with direct application to testing.
Technology and the testing of speaking
While computers represent the most rapid point of technological
change, other less complex technologies, which have been in use
for some time, have led to similar validity questions.
NEW D I R E CT I O N S - AN D D I LE M MA S ?
8 r
Tape recorders can be used in the administration of speaking
tests. Candidates are presented with a prompt on tape, and are
asked to respond as if they were talking to a person, the response
being recorded on tape. This performance is then scored from
the tape. Such a test is called a semi-direct test of speaking, as
compared with a direct test format such as a live face-to-face
interview.
But not everybody likes speaking to tapes! We all know the dif
ficulty many people experience in leaving messages on answering
machines. Most test-takers prefer a direct rather than a semi
direct format if given the formats. But the question then arises as
to whether these options are equivalent in testing terms. How far
can you infer the same ability from performance on different for
mats? It is possible for somebody to be voluble in direct face-to
face interaction but tongue-tied when confronted with a machine,
and vice versa. Research looking at the performance of the same
candidates under each condition has shown that this is a complex
issue, as not all candidates react in the same way (hardly surpris
ing, of course). Some candidates prefer the tape, some prefer a live
interlocutor, and performance generally improves in the condi
tion that is preferred. But we must also add the interlocutor fac
tor. Some candidates get on well with particular interlocutors,
others are inhibited by them. And there is the rater factor. Some
raters react negatively to tapes, and to particular interlocutors,
and may, without realizing it, either compensate or 'punish' the
candidate when giving their ratings.
Given such issues, why are semi-direct tests used? Cost consid
erations and the logistics of mass test administration are likely to
favour their use.
The semi-direct format is cheaper to administer, as a live inter
locutor (the person who interacts with the candidate) does not
have to be provided. On the other hand, the fact that the tape still
has to be individually rated means that the test is by no means
inexpensive; and in many face-to-face speaking tests the inter
locutor and the rater are the same person, so that no real saving is
achieved. In addition, the preparation of the tape and the supply
of recording equipment is expensive. Nevertheless, in appropriate
circumstances, considerable economies can be achieved. A fur
ther advantage is that in cases of languages where there are only a
8 2
S U RV E Y
small number of candidates presenting for assessment at any one
time, testing can be provided virtually on demand in any location.
This would not be possible if a trained interlocutor for that lan
guage had to be found. Finally, research has demonstrated that
the interlocutor you interact with may affect your score. Some
interlocutors elicit performances which trigger a favourable
impression of the candidate; others have the reverse effect. The
problem is that raters typically don't realize that it is the inter
locutor's behaviour which is contributing to the impression gen
erated-a classic case of 'blame the victim'. As a semi-direct test
removes the interlocutor variable-all candidates face the same
prompt, delivered by tape-it might be felt that the semi-direct
test has the potential to be a fairer test.
The issues raised by semi-direct tests of speaking are rapidly
becoming more urgent as pressure to make tests more commu
nicative leads to an increased demand for speaking tests. But such
tests can often only feasibly be provided in a semi-direct format,
given huge numbers of candidates sitting for the test in a large
number of countries worldwide, as for example with a test such as
TOEFL.
The issue here is a fundamental one. It illustrates the ten
sion between the feasibility of tests (the need to design and admin
ister them practically and cheaply if they are to be of any use at
all), and their validity. There are three basic critical dimensions of
tests (validity, reliability, and feasibility) whose demands need to
be balanced. The right balance will depend on the test context and
test purpose.
Dilemmas: whose performance?
The speed of technological advances affecting language testing
sometimes gives an impression of a field confidently moving
ahead, notwithstanding the issues of validity raised above. But
concomitantly the change in perspective from the individual to
the social nature of test performance has provoked something of
an intellectual crisis in the field. In Chapter
7
we looked at the
social nature of test performance in a larger political and cultural
sense; here will examine the social character of performance at a
more micro level, at the level of interaction. Developments in dis
course analysis and pragmatics have revealed the essential inter
activity of all communication. This is especially clear in relation
N E W D I R E C T I O N S - AN D D I L E M MA S ?
8 3
to the assessment of speaking. The problem is that of isolating the
contribution of a single individual (the candidate) in a joint com
municative activity. As soon as you try to test use (as opposed to
usage) you cannot confine yourself to the single individual. So
whose performance are we assessing?
Take the following example. A Thai nurse working with
elderly patients in an American geriatric hospital setting is liked
and respected by her patients and supervising colleagues, and is
effective in her work despite glaring deficiencies in her grammar,
vocabulary and pronunciation in English. The people she com
municates with expect to have to take some responsibility for the
success of the communication, in view of her limited English pro
ficiency. They contribute through the active process of drawing
inferences from what she has said, checking that they have under
stood, and seeking clarification in various ways. All of these activ
ities on their part contribute to successful communication with
her. Her professional knowledge of nursing is excellent, and this
helps in the framing of her communication, to make it relevant.
With her professional competence, pleasant personality, and the
need for her interlocutors to communicate with her, clinical com
munication seems to be successful; there is no reason to exclude
her from the workplace, even though this might be suggested by a
'cold' assessment of her communication in
the
absence of an
interlocutor, and in non-clinical contexts.
A
contrasting example.
A
nurse from Hong Kong, a native
speaker of Cantonese and a competent speaker of English by most
standards, is at the centre of a controversy in a hospital in an
English-speaking country. A sudden emergency with a patient in
the ward requires the nurse to make a telephone call to the recep
tionist, a native speaker of English, for urgent help. The recep
tionist claims not to be able to understand the nurse, the message
does not get through, and the patient dies. It turns out that the
receptionist has a reputation for being racist. It is possible that she
in a sense refused to understand? Whatever the explanation, com
munication did not take place. Whom should we blame for this
breakdown?
In each of these examples, it is not clear who is responsible for
the success or failure of the communication. It seems that success
or failure is a joint achievement: the communication is a co-
84 S URVEY
construction. In assessment, should we not then take the inter
locutor into account in our predictions of successful communica
tion? But how can that be done? And how can this be made to fit
the institutional need for a score about individual candidates on
their own, not about individuals and their interlocutors? Is profi
ciency best understood as something that individuals carry round
in their heads with them, or does it only exist in actual perfor
mances, which are never solo ? Note that the issue of the joint
responsibility for communication raised here relates not only to
communication involving non-native speakers; it is equally rele
vant for communication between native speakers. What is at issue
here are general pragmatic conditions of normal communication,
and the difficulty of pinning them down in any testing procedure.
This is then another fundamental dilemma for language testing.
The issues raised here show the way in which language testing,
as in other fields of assessment, is crucially dependent on defini
tions of the test construct. It is thus, in a way, vulnerable to our
evolving understanding of language and communication, and
cannot be protected by its success in other aspects, for example
advances in the technical aspects of psychometrics or in the tech
nology of assessment. The disconcerting aspect of the current sit
uation is that a growing loss of confidence in the possibility or
even desirability of locating competence in the individual, as illus
trated in the examples presented above, seems to challenge the
very adequacy of our current theories of
m
e
a
su
reme
nt
,
with their
promise of providing a single summary score as the basis for the
reliable classification decision that we seek. Instead of the individ
ual carrying a measurable proficiency round in his or her head, we
have a multiplicity of selves in interaction in a multiplicity of
interactional contexts. How can measurement do justice to this?
And in the dazzle of technological advance, we may need a con
tinuing reminder of the nature of communication as a shared
human activity, and that the idea that one of the participants can
be replaced by a machine is really a technological fantasy.
Language testing remains a complex and perplexing activity.
While insights from evolving theories of communication may be
disconcerting, it is necessary to fully grasp them and the challenge
they pose if our assessments are to have any chance of having the
meaning we intend them to have. Language testing is an uncertain
N EW D I R E CT I O N S - AN D D I LEMMAS ?
8 5
and approximate business at the best of times, even if to the out
sider this may be camouflaged by its impressive, even daunting,
technical (and technological) trappings, not to mention the
authority of the institutions whose goals tests serve. Every test is
vulnerable to good questions, about language and language use,
about measurement, about test procedures, and about the uses to
which the information in tests is to be put. In particular, a lan
guage test is only as good as the theory of language on which it is
based, and it is within this area of theoretical inquiry into the
essential nature of language and communication that we need to
develop our ability to ask the next question. And the next.
8 6
S U RV E Y
S ECT I O N 2
Readings
Chapter
1
Testing, testing . . . What is a language test?
Text 1
A LA N D A V I E S :
'The construction of language tests' in
J. P.B. Allen and Alan Davies (eds.):
The Edinburgh Course
in Applied Linguistics Volume
4:
Testing and Experimental
Methods.
Oxford University Press 1977, pages
4 5-46
In this paper, Davies distinguishes four important uses or
functions of language tests: achievement, proficiency, apti
tude, and diagnostic. In this extract he discusses the first two
of these.
Achievement
Achievement or attainment tests are concerned with assessing
what has been learned of a known syllabus. This may be within a
school or within a total educational system. Thus the typical
external school examinations ( 'Ordinary' level or 'Advanced'
level in England, 'Highers' in Scotland), the university degree
exams and so on are all examples of achievement tests. The use
being made of the measure is to find out just how much has been
learned of what has been taught (i.e., of the syllabus).
Achievement type tests end there. Although the primary interest is
in the past, i.e. what has b
e
en learned,
very often some further use
is made of the same test in order to make meaningful decisions
about the pupils' future. It would, presumably, be possible to be
interested entirely in the past of the pupils; Carroll's 'meaningful
R EA D I N G S
8 7
decisions' then would refer to the syllabus, i.e., to any necessary
alterations to it that might be necessary or to the teaching method
to be used for the next group of students. But achievement tests
are almost always used for other purposes as well. It is important
to recognize this and to account for it in one's test construction.
But, as will be maintained later under validity, this is essentially a
function of the syllabus. All that an achievement test can do is to
indicate how much of a syllabus has been learned; it cannot make
predictions as to pupils' future performance unless the syllabus
has been expressly designed for this purpose.
I>
What are some of the functions of the examinations Davies
mentions (external school examinations, university degree
examinations) other than looking back over what has been
learned?
I>
What 'future performance' does the writer have in mind? In
what way can the design of a syllabus be used as the basis for
predictions as to pupils' future performance?
Proficiency
Proficiency tests, as we see it, are concerned with assessing what
has been learned of a known or an unknown syllabus. Here we see
the distinction between proficiency and achievement. In the non
language field we might consider, say, a driving test as a kind of
proficiency test since there is the desire to apply a common stan
dard to all who present themselves whatever their previous driving
experience, over which of course there has been no control at all.
In the language field there are several well-known proficiency
exams of the same journeyman kind: the Cambridge Proficiency
Exams, the Michigan Tests, the Test of English as a Foreign
Language (TOEFL) and English Proficiency Test Battery (EPTB).
These all imply that a common standard is being applied to all
comers. More sophisticated proficiency tests (more sophisticated
in use, not in design) may be constructed as research tools to deter
mine just how much control over a language is needed for certain
purposes, for example medical studies in a second language.
I>
How does the fact that a proficiency test may relate to an
unknown syllabus serve as the basis for a distinction from
achievement tests?
8 8
REA D I N G S
C>
If syllabus content is absent as a basis for the content of a pro
ficiency test, how can we decide what it should contain?
Chapter
2
Communication and the design of language tests
Text 2
R O B E R T L A D O :
Language Testing: The Construction
and Use of Foreign Language Tests.
Longmans 1961,
pages 22-24
Lado presents the case for basing language tests on a theory of
language description and a theory of learning, in particular on
the points of structural contrast between the learner's first lan
guage and the target language. His recommendations about
testing dominated practice for nearly twenty years, and are
still influential in powerful tests such as
TOEFL.
The theory of language testing assumes that language is a system
of habits of communication. These habits permit the communi
cant to give his conscious attention to the over-all meaning he is
conveying or perceiving. These habits involve matters of form,
meaning and distribution at several levels of structure, namely
those of the sentence, clause, phrase, word, morpheme and
phoneme. Within these levels are structures of modification,
sequence, parts of sentences. Below them are habits of articula
tion, syllable type, and collocations. Associated with them and
sometimes part of them are patterns of intonation, stress and
rhythm . . . .
The individual is not aware that so much of what he does in
using language is done through a complex system of habits. When
he attempts to communicate in a foreign language that he knows
Chia sẻ với bạn bè của bạn: |