part of language knowledge, yet does have an impact on perfor
mance, should it be included as part of the focus of assessment?
After all, competent native speakers differ in their conversational
facility and their preparedness to take risks in communication,
and these differences of temperament rather than competence are
likely to carry over into second language communication. If we
are to judge strategic competence, by what standards should we
do so, given the variability among native speakers in this regard?
On the other hand, if we are to exclude strategic competence as a
target of assessment, how can we equalize its impact on impres
sions of performance? In other words, what at first look like
abstract theoretical debates about the nature of competence and
performance in language tests have very practical consequences
for test design and for the procedures by which judges will make
ratings of performance.
Apart from the increasing specification of what knowledge is
presupposed in communication, there has also been an attempt to
grasp the slippery issue of what things other than knowledge are
called upon in performance in communicative tests, particularly
where performances involve interaction with another person, as
in oral performance tests. These will include confidence, motiva
tion, emotional states, the identities of candidate and interlocu
tor, and so on. Of course, an awareness of the complexity of
factors involved in performance complicates enormously the task
of standardizing the conditions of assessment (in the interest of
fairness). The slowness with which the field has come to grips
with the issues involved is perhaps motivated by a reluctance to
face the difficulties of achieving a fair assessment in performance
tests.
C O M M U N I CATI O N A N D T H E D E S I G N O F L AN G U A G E T E S T S
I9
Another development has been the attempt to characterize the
real world tasks in the criterion situation identified through job
analysis in terms of the aspects of ability or skill (as specified in
the model of ability) they call upon. This has involved the analysis
of tasks in terms of those components of knowledge that they
require, so that performance on tasks can be used as evidence of
command (or otherwise) of specific components of knowledge
and skill. In this way the content of test tasks and test method are
specified more precisely and this can provide a more explicit basis
for claims to the validity of interpretations of test performance.
It should be noted that the approach to thinking about commu
nicative language ability in terms of discrete components leaves
us with aspects of language analysed out as distinct and unre
lated. There is still therefore the problem, which models of com
municative competence were designed to resolve, of how to
account for the way the different aspects act upon each other in
actual communication. Paradoxically, as models of communica
tive competence become more analytic, so they take us back to the
problems of discrete point testing usually associated with testing
of form alone.
Nevertheless, the elaboration of models of abilities underlying
performance has been helpful for both mapping research in lan
guage
t
e
s
t
i
n
g and classifying language tests, and providing lan
guage test developers and researchers with a common language to
talk about the focus of their work. And even though the model
ling of communicative language ability may appear somewhat
dauntingly complex and even abstract at times, the issues being
considered in this debate have very clear practical consequences
as we have seen.
But it is also true that attempts to apply a complex framework
for modelling communicative language ability directly in test
design have not always proved easy, mainly because of the com
plexity of the framework. This has sometimes resulted in a rather
tokenistic acknowledgement of the framework and then a disre
gard for it at the stage of practical test design. Where a more thor
ough attempt to base test design on the framework has been
made, the procedures for specifying test content prove to be
unwieldy.
If communicative tests are to move forward they will need to
20
S U RV E Y
address the problem of feature isolation raised earlier, whereby
features of language use are analysed out and performance neces
sarily distorted because performance is not a collection of fea
tures but an integrated interplay between them. The issue raised
here takes us back to the points made previously, in Chapter r ,
that criterion behaviour is bound to b e elusive and in principle
beyond the scope of assessment in a direct sense. A further issue
involves the implications for test validity of interpreting test
performance, for example on a speaking test, in terms of only one
of the participants, the candidate. Clearly, many others than the
candidate affect the chances of the candidate achieving a success
ful score for the performance. These will include those who frame
the opportunity for performance at the test design stage; those
with whom the candidate interacts; those who rate the perfor
mance; and those responsible for designing and managing the
rating procedure. Instead of focusing on the candidate in isola
tion, the candidate's performance needs to be seen and evaluated
as part of a joint construction by a number of participants, includ
ing interlocutors, test designers, and raters. The intrinsically
social character of test performance is discussed at length in
Chapter
7·
Conclusion
In this chapter we have examined a number of influential
'schools' of language testing, the latter ones claiming to supersede
the earlier ones on the grounds of the advances they have made in
understanding the essential nature of performance (language use)
in the criterion. In fact, the testing practices associated with ear
lier approaches have far from disappeared, which is why appreci
ating earlier work is necessary for understanding the current
rather eclectic scene in language testing.
Historically, views of performance in the criterion situation
have focused either on the cognitive abilities that the individual
brings to it or on its social character. Attempts have also been
made to resolve the inevitable difference between these perspec
tives. The ability to participate in the social nature of interaction
is seen as depending on the candidate knowing certain socially
determined communicative conventions, for example, how to
match the form of language to the topic, the setting, the interlocu-
2 1
tor, and so on. In this way, an understanding of the social charac
ter of the criterion setting is formulated in terms of relevant
knowledge of socially determined communicative conventions.
The social dimension of communication then becomes part of
what the candidate needs to know, and can thus be part of the
cognitive dimensions of successful performance. Although all
tests imply a view of the nature of the criterion, these views are
not always explicit in testing schemes.
22
S U RV E Y
3
The testing cycle
Designing and introducing a new test is a little like getting a new
car on the road. It involves a design stage, a construction stage,
and a try-out stage before the test is finally operational. But that
suggests a linear process, whereas in fact test development
involves a cycle of activity, because the actual operational use of
the test generates evidence about its own qualities. We need to
pay attention to this information, indeed actively to seek it out,
and use it to do further thinking about the test-and so another
turn of the cycle begins.
In this chapter we will outline the stages and typical procedures
in this cyclical process. Further details about some of the stages
will be given in subsequent chapters.
Who starts the circle turning? New situations arise, usually
associated with social or political changes, which generate the
need for a new test or assessment procedure. These include the
growth of international education, increased labour flows
between countries as the result of treaties, the educational impact
of immigration or refugee programmes, school curriculum
reform, or reform of vocational education, and training for adults
in the light of technological change. For example, the needs of the
US Government during the Cold War for personnel who could
handle spoken communication in a range of strategically impor
tant languages inspired one of the most forward-looking develop
ments in language testing, the Oral Proficiency Interview (OPI). In
this procedure, performance in a short interaction with a native
speaker interlocutor is judged against a set of descriptions of per
formance at various levels. With various modifications, this has
remained the most commonly used means for the direct testing of
THE T E S T I N G C Y C L E
23
spoken language skills. The political and social origins and mean
ing of language tests have recently been brought more clearly into
focus, and the complex responsibilities of the language tester as
the agent of political, commercial, and bureaucratic forces are the
subject of discussion in Chapter
7 ·
Those responsible for managing the implications of change,
usually in corporations or bureaucracies, commission the work of
test developers. When school systems are involved, the work of
responding to changing needs is met from within education min
istries, often with the assistance of university researchers. But par
ticularly with the assessment of adults, or where assessment
involves international contexts, testing agencies with specialist
expertise in language testing become involved. Such agencies are
responsible for the two major tests used to measure the English of
international students wishing to study in universities in the
English-speaking world: the American Test of English as a Foreign
Language (TOEFL)
and the British/Australian International English
Language Testing System (IELTS) .
Understanding the constraints
Before they begin thinking in detail about the design of a test, test
developers will need to get the lay of the land, that is, to establish
the constraints under which they are working, and under which
the test will be administered. What resources, physical and finan
cial, are available
for
test
development and
test operation? There
is no point in proposing a performance test if there is no money
available for the provision of properly trained raters, or if the pro
vision of trained raters cannot be guaranteed in certain remote
locations in which the test is to be administered. Tests of speaking
and listening delivered in language laboratories, or tests delivered
via computer, are not practical options where the technology is
not available. Test security is also a constraint-can we be sure
that detailed knowledge of the contents of any version will be
kept from candidates until the time of the examination? The func
tions that any assessment procedure are required to perform can
also act as an important constraint. For example, there is an
increasing tendency for governments to require the reporting of
the success of language programmes against national scales or
benchmarks. This may mean that any procedure that teachers use
24
S U RVEY
in their classrooms to gather information on the progress of learners
may have to be compatible with such over-arching reporting
schemes.
Following this initial ground-clearing, we move on to the
detailed design of the test. This will involve procedures to estab
lish test content, what the test contains and test method, the way
in which it will appear to the test-taker, the format in which
responses will be required, and how these responses will be
scored.
Test content
From a practical point of view test design begins with decisions
about test content, what will go into the test. In fact, these deci
sions imply a view of the test construct, the way language and
language use in test performance are seen, together with the rela
tionship of test performance to real-world contexts of use. In the
previous chapter, we explored a number of current approaches to
thinking about test constructs. In major test projects, articulating
and defining the test construct may be the first stage of test devel
opment, resulting in an elaborated statement of the theoretical
framework for the test. Even here constraints can operate; the
new test may have to fit into an approach which has been deter
mined in advance, for example by educational policy makers.
This is currently the case in assessment which takes place as part
of vocational training, where the approach to training will deter
mine the approach to assessment; this is discussed further in
Chapter
7
on the institutional character of language tests.
Establishing test content involves careful sampling from the
domain
of the test, that is, the set of tasks or the kinds of behav
iours in the criterion setting, as informed by our understanding of
the test construct. Depending on the construct, the test domain is
typically defined in one of two ways. It can be defined opera
tionally, as a set of practical, real-world tasks. Sampling then
involves choosing the most characteristic tasks from the domain,
for example, in terms of their frequency or importance.
Alternatively, the domain can be defined in terms of a more
abstract construct, for example, in terms of a theory of the com
ponents of knowledge and ability that underlie performance in
the domain. For example, it may be defined in terms of knowledge
T H E T E S T I N G C Y C L E
25
of the grammatical system, or of vocabulary, or of features of pro
nunciation, or ability to perform aspects of the skill areas of
speaking, listening, reading or writing. In this case, the test will
sample, in a principled way, from a range of frequent grammati
cal structures or of items of vocabulary at the appropriate level, or
will include each relevant skill area.
In the former case, if the performance domain is associated
with particular known roles, for example, those occurring in a
work or study skills setting, then a job analysis is carried out so
that the communicative roles facing test-takers in the criterion
situation can be determined and used as the basis for test design.
This job analysis will typically involve eliciting the insights of
those familiar with the target setting, for example, non-native
speakers who are currently working within it. Other suitable
informants will be job educators or trainers and other experts
whose work requires them to have an articulated understanding
of the character and demands of the setting. Methods used will
include questionnaires and interview. It may also be possible to
draw on a literature analysing the characteristics of the com
municative demands of the setting; this is true in the area of
medical communication, for example. When the job analysis has
been completed, test materials will be written reflecting the
domain, and a panel
of
experts who know the nature
of
the
work
involved may be asked to judge their relevance, coverage, and
authenticity.
Test method
The next thing to consider in test design is the way in which
candidates will be required to interact with the test materials, par
ticularly the response format, that is, the way in which the candi
date will be required to respond to the materials. (The term test
method covers these aspects of design together with the issue of
how candidate responses will be rated or scored.) There are two
broad approaches to understanding the relation of test method to
test content. The first sees method as an aspect of content, and
raises issues of authenticity; the second, more traditional
approach treats method independently of content, and allows
more obviously inauthentic test response formats.
2 6
S U RV E Y
Authenticity of response
The job analysis discussed earlier will identify the range of com
municative roles and tasks which characterize the criterion set
ting. This provides a basis not only for determining the kinds of
texts to be included in the test, but also how candidates will inter
act with them. We may attempt to reproduce, as far as is possible
in the test setting, the conditions under which they are processed
or produced in reality. In this way, test method involves simula
tion of the criterion as much as other aspects of the test materials.
The test method can itself become an aspect of relevant test con
tent, if we define content not only in terms of the texts to be
included but how they are used.
However, such an approach raises the issue of authenticity of
response. There are competing imperatives. On the one hand it is
desirable to replicate, as far as is possible in the test setting, the
conditions under which engagement with communicative content
is done in the criterion setting, so that inferences from the test per
formance to likely future performance in the criterion can be as
direct as possible. On the other hand, it is necessary to have a pro
cedure that is fair to all candidates, and elicits a scorable perfor
mance, even if this means involving the candidates in somewhat
artificial behaviour. Once again, test design involves a sort of
principled compromise. Let us consider this issue firstly in the
context of assessing listening comprehension, and then in the con
text of the assessment of speaking.
In the development of a test of English as a foreign language for
international students, a job analysis may reveal that listening to
lectures is an important part of the candidates' future role as stu
dents, and so it makes sense to include listening to a lecture as part
of the test. But how should evidence of such comprehension abil
ity be sought? What form should the test task take? There are a
number of possibilities:
r
The task replicates what students have to do in the target situa
tion. The candidate is asked to take notes, which are then
scored for the accuracy of their content. But students take
notes in very different ways, some of which may cause difficul
ties in scoring. For example, if a particular person fails to make
a note about a certain point, it may be because it has not been
T H E T E S T I N G C Y C L E
27
understood, but it may also be because the note-taker has
determined that it does not merit a note.
2
If candidates are required to answer pre-set questions, time
might be given for reading them prior to the listening, or candi
dates might simply be required to read as they listen. If prior,
either all the questions are presented at once, or a few are pre
sented at a time. But if the latter, how many? That is, how long
a chunk of lecture should the candidate have to process at any
one time? Obviously, too long a stretch may introduce irrele
vant memory considerations, and the test becomes as much a
test of memory as of listening comprehension. On the other
hand, if too short, then the task of following extended stretches
of discourse is not represented in the test.
3
A candidate might be required to listen to the input just once,
or more than once. Obviously repetition is unlike the real
world in the sense that lectures are not repeated. On the other
hand it may be argued that many students make a practice of
audio-taping lectures to facilitate comprehension, recall, and
note-taking following the lecture.
All decisions about test method in such a context inevitably
involve a compromise between the desirability of an appearance
of authenticity on
the
one hand
and the practicalities imposed
by
the test situation on the other. Note that the way we resolve this
compromise may have the undesired effect of jeopardizing the
fairness of the conclusions we reach about individual candidates.
For example, in the case where a candidate has been judged not to
meet a required standard, might observation of his/her perfor
mance under more natural conditions have led us to a different
conclusion ? Methods of investigating the impact of decisions
about test method on the fairness of our judgements will be taken
up in Chapter
5
on validity.
In relation to the assessment of speaking, related questions of
authenticity arise. Consider the case of immigrant non-native
speaker teachers who will have to teach their subject area
science or mathematics, let us say-through the medium of a
second language. Or consider teachers of foreign languages who
wish to conduct their classes through the medium of the target
language. In order to assess whether teachers in each group are
28
S U RV E Y
communicatively competent to manage their classes in their sec
ond language, approaches ranging along a continuum of authen
ticity are conceivable.
Consider an aspect of classroom management, giving instruc
tions for an activity. The most contrived, yet the most manage
able from a test administration point of view, is to give a
paper-and-pencil assessment of ability to handle this task, for
example, by getting candidates to write out what they would say
in giving such instructions. This might be an adequate test of can
didates' control of the content of the instructions, but not give us
any evidence about their ability to execute them, particularly in
the context of interaction. Alternatively, we might attempt to
simulate the task with a role-play simulation of giving instruc
tions for the setting up of a specific classroom activity, perhaps in
a one-to-one setting with an examiner playing the role of a stu
dent being given the instructions. More realistically still, we may
require the teacher to teach an actual lesson, either with a spe
cially assembled group of students, or in the context of an actual
school lesson. Particular times and occasions of observation in
the actual work setting may be agreed upon, for example during
the course of school practice during a teacher training course.
Alternatively, non-native speaker teachers in some cases may be
given professional
accreditation, and the adequacy of their profi
ciency in the second language to handle the communicative tasks
of the work setting may be assessed over an extended period as
Chia sẻ với bạn bè của bạn: |