Oxford Introductions to Language Study Language Testing

part of language knowledge, yet does have an impact on perfor

tải về 2.79 Mb.

Chế độ xem pdf

trang	2/15
Chuyển đổi dữ liệu	16.04.2022
Kích	2.79 Mb.
	#51661

1 2 3 4 5 6 7 8 9 ... 15

(by-Tim-McNamara)-Language-Testing

part of language knowledge, yet does have an impact on perfor
mance, should it be included as part of the focus of assessment?
After all, competent native speakers differ in their conversational
facility and their preparedness  to  take risks in communication,
and these differences of temperament rather than competence are
likely to carry over into second language  communication.  If we
are to judge strategic competence, by what standards should we
do so, given the variability among native speakers in this regard?
On the other hand, if we are to exclude strategic competence as a
target of assessment, how can we equalize its impact on impres
sions  of  performance?  In  other  words,  what  at  first  look  like
abstract theoretical debates about the nature of competence  and
performance  in  language  tests have very practical consequences
for test design and for the procedures by which judges will make
ratings of performance.
Apart from the increasing specification of what knowledge is
presupposed in communication, there has also been an attempt to
grasp the slippery issue of what things other than knowledge are
called upon in performance in communicative tests, particularly
where performances involve interaction with another person, as
in oral performance tests. These will include confidence, motiva
tion, emotional states, the identities of candidate and interlocu
tor,  and  so  on.  Of course,  an  awareness  of the  complexity  of
factors involved in performance complicates enormously the task
of standardizing  the  conditions  of assessment  (in  the  interest  of
fairness).  The  slowness with which  the  field  has come  to  grips
with the issues involved is perhaps motivated by a reluctance to
face the difficulties of achieving a fair assessment in performance
tests.
C O M M U N I CATI O N  A N D   T H E   D E S I G N   O F   L AN G U A G E  T E S T S
I9

Another development has  been the attempt to characterize the
real  world  tasks  in the criterion situation identified through  job
analysis in terms of the aspects of ability or skill  (as specified in
the model of ability) they call upon. This has involved the analysis
of tasks  in  terms  of those  components  of knowledge  that they
require, so that performance on tasks can be used as evidence of
command  (or  otherwise)  of specific  components  of knowledge
and skill. In this way the content of test tasks and test method are
specified more precisely and this can provide a more explicit basis
for claims to the validity of interpretations of test performance.
It should be noted that the approach to thinking about commu
nicative language ability in terms of discrete components leaves
us  with  aspects of language  analysed  out  as  distinct  and  unre
lated. There is still therefore the problem, which models of com
municative  competence  were  designed  to  resolve,  of  how  to
account for the way the different aspects act upon each other in
actual communication. Paradoxically, as models of communica
tive competence become more analytic, so they take us back to the
problems of discrete point testing usually associated with testing
of form alone.
Nevertheless, the elaboration of models of abilities underlying
performance has been helpful for both mapping research in lan
guage
t
e
s
t
i
n
g  and  classifying  language  tests,  and  providing  lan
guage test developers and researchers with a common language to
talk about the focus of their work. And even though the model
ling  of  communicative  language  ability  may  appear  somewhat
dauntingly complex and even  abstract  at times, the issues being
considered in this debate have very clear practical consequences
as we have seen.
But it is also true that attempts to apply a complex framework
for  modelling  communicative  language  ability  directly  in  test
design have not always proved easy, mainly because of the com
plexity of the framework. This has sometimes resulted in a rather
tokenistic acknowledgement of the framework and then a disre
gard for it at the stage of practical test design. Where a more thor
ough  attempt  to  base  test  design  on  the  framework  has  been
made,  the  procedures  for  specifying  test  content  prove  to  be
unwieldy.
If communicative tests are to move forward they will need to
20
S U RV E Y

address the problem of feature  isolation raised earlier,  whereby
features of language use are analysed out and performance neces
sarily  distorted  because  performance  is not  a  collection  of fea
tures but an integrated interplay between them.  The issue raised
here takes us back to the points made  previously, in Chapter  r ,
that criterion  behaviour is  bound to b e  elusive  and  in principle
beyond the scope of assessment in a direct sense. A further issue
involves  the  implications  for  test  validity  of  interpreting  test
performance, for example on a speaking test, in terms of only one
of the participants, the candidate. Clearly, many others than the
candidate affect the chances of the candidate achieving a success
ful score for the performance. These will include those who frame
the  opportunity for performance  at the  test  design  stage;  those
with  whom  the  candidate  interacts;  those  who  rate the  perfor
mance;  and  those  responsible  for  designing  and  managing  the
rating  procedure.  Instead of focusing on the  candidate  in isola
tion, the candidate's performance needs to be seen and evaluated
as part of a joint construction by a number of participants, includ
ing  interlocutors,  test  designers,  and  raters.  The  intrinsically
social  character  of  test  performance  is  discussed  at  length  in
Chapter
7·
Conclusion
In  this  chapter  we  have  examined  a  number  of  influential
'schools' of language testing, the latter ones claiming to supersede
the earlier ones on the grounds of the advances they have made in
understanding the essential nature of performance (language use)
in the criterion. In fact, the testing practices associated with ear
lier approaches have far from disappeared, which is why appreci
ating  earlier  work  is  necessary  for  understanding  the  current
rather eclectic scene in language testing.
Historically,  views  of performance  in  the  criterion  situation
have focused either on the cognitive abilities that the individual
brings to  it or on its  social character.  Attempts have  also  been
made to resolve the inevitable difference between these perspec
tives. The ability to participate in the social nature of interaction
is seen  as  depending  on the  candidate  knowing  certain  socially
determined  communicative  conventions,  for  example,  how  to
match the form of language to the topic, the setting, the interlocu-
2 1

tor, and so on. In this way, an understanding of the social charac
ter  of  the  criterion  setting  is  formulated  in  terms  of  relevant
knowledge  of socially  determined communicative  conventions.
The  social  dimension  of communication  then  becomes  part  of
what the candidate needs to know, and can thus be part of the
cognitive  dimensions  of  successful  performance.  Although  all
tests imply a view of the nature of the criterion, these views are
not always explicit in testing schemes.
22
S U RV E Y

3
The testing cycle
Designing and introducing a new test is a little like getting a new
car on the road. It involves a design stage,  a construction stage,
and a try-out stage before the test is finally operational. But that
suggests  a  linear  process,  whereas  in  fact  test  development
involves a cycle of activity, because the actual operational use of
the test generates evidence  about its own qualities. We  need to
pay attention to this information, indeed actively to seek it out,
and use it to do further thinking about the test-and so another
turn of the cycle begins.
In this chapter we will outline the stages and typical procedures
in this cyclical process. Further details about some of the stages
will be given in subsequent chapters.
Who  starts  the  circle  turning?  New  situations  arise,  usually
associated  with  social  or  political  changes,  which  generate  the
need for a  new test or assessment procedure.  These  include  the
growth  of  international  education,  increased  labour  flows
between countries as the result of treaties, the educational impact
of  immigration  or  refugee  programmes,  school  curriculum
reform, or reform of vocational education, and training for adults
in the light of technological change. For example, the needs of the
US Government  during the  Cold War for personnel who  could
handle spoken communication in a range of strategically impor
tant languages inspired one of the most forward-looking develop
ments in language testing,  the  Oral Proficiency Interview  (OPI). In
this procedure, performance in a  short interaction with a native
speaker interlocutor is judged against a set of descriptions of per
formance at various levels. With various modifications, this has
remained the most commonly used means for the direct testing of
THE  T E S T I N G   C Y C L E
23

spoken language skills. The political and social origins and mean
ing of language tests have recently been brought more clearly into
focus, and the complex responsibilities of the language tester as
the agent of political, commercial, and bureaucratic forces are the
subject of discussion in Chapter
7 ·
Those  responsible  for  managing  the  implications  of change,
usually in corporations or bureaucracies, commission the work of
test  developers.  When school systems are involved, the work of
responding to changing needs is met from within education min
istries, often with the assistance of university researchers. But par
ticularly  with  the  assessment  of  adults,  or  where  assessment
involves  international  contexts,  testing  agencies  with  specialist
expertise in language testing become involved. Such agencies are
responsible for the two major tests used to measure the English of
international  students  wishing  to  study  in  universities  in  the
English-speaking world: the American Test of English as a Foreign
Language  (TOEFL)
and  the British/Australian  International  English
Language Testing System (IELTS) .
Understanding the constraints
Before they begin thinking in detail about the design of a test, test
developers will need to get the lay of the land, that is, to establish
the constraints under which they are working,  and under which
the test will be administered. What resources, physical and finan
cial, are available
for
test
development and
test operation? There
is no point in proposing a performance test if there is no money
available for the provision of properly trained raters, or if the pro
vision  of trained raters cannot be guaranteed in certain remote
locations in which the test is to be administered. Tests of speaking
and listening delivered in language laboratories, or tests delivered
via computer,  are  not practical options where the technology is
not  available. Test security  is also  a constraint-can we  be  sure
that  detailed  knowledge  of the  contents  of any version  will  be
kept from candidates until the time of the examination? The func
tions that any assessment procedure are required to perform can
also  act  as  an  important  constraint.  For  example,  there  is  an
increasing tendency for governments to require the reporting of
the  success  of  language  programmes  against  national  scales  or
benchmarks. This may mean that any procedure that teachers use
24
S U RVEY

in their classrooms to gather information on the progress of learners
may  have  to  be  compatible  with  such  over-arching  reporting
schemes.
Following  this  initial  ground-clearing,  we  move  on  to  the
detailed design of the test. This will involve procedures to estab
lish test content, what the test contains and test method, the way
in  which  it  will  appear  to  the  test-taker,  the  format  in  which
responses  will  be  required,  and  how  these  responses  will  be
scored.
Test content
From a practical point  of view test design  begins with decisions
about test content, what will go into the test. In fact, these deci
sions  imply  a  view  of the  test construct,  the  way  language  and
language use in test performance are seen, together with the rela
tionship of test performance to real-world contexts of use. In the
previous chapter, we explored a number of current approaches to
thinking about test constructs. In major test projects, articulating
and defining the test construct may be the first stage of test devel
opment,  resulting  in  an  elaborated  statement  of the  theoretical
framework  for  the  test.  Even  here  constraints  can  operate;  the
new test may have to fit into an approach which has been deter
mined  in  advance,  for  example  by  educational  policy  makers.
This is currently the case in assessment which takes place as part
of vocational training, where the approach to training will deter
mine  the  approach  to  assessment;  this  is  discussed  further  in
Chapter
7
on the institutional character of language tests.
Establishing  test  content  involves  careful  sampling  from  the
domain
of the test, that  is,  the set of tasks or the  kinds  of behav
iours in the criterion setting, as informed by our understanding of
the test construct. Depending on the construct, the test domain is
typically  defined  in  one  of two  ways.  It  can  be  defined  opera
tionally,  as  a  set  of  practical,  real-world  tasks.  Sampling  then
involves choosing the most characteristic tasks from the domain,
for  example,  in  terms  of  their  frequency  or  importance.
Alternatively,  the  domain  can  be  defined  in  terms  of  a  more
abstract construct, for example, in terms of a theory of the com
ponents of knowledge  and ability that underlie performance  in
the domain. For example, it may be defined in terms of knowledge
T H E  T E S T I N G   C Y C L E
25

of the grammatical system, or of vocabulary, or of features of pro
nunciation,  or  ability  to  perform  aspects  of  the  skill  areas  of
speaking, listening, reading or writing.  In this case, the test will
sample, in a principled way, from a range of frequent grammati
cal structures or of items of vocabulary at the appropriate level, or
will include each relevant skill area.
In  the  former  case,  if the  performance  domain  is  associated
with particular known roles, for example, those  occurring  in  a
work or  study  skills  setting, then a job analysis is carried out so
that  the  communicative  roles  facing  test-takers  in  the  criterion
situation can be determined and used as the basis for test design.
This  job  analysis  will  typically  involve  eliciting  the  insights  of
those  familiar  with  the  target  setting,  for  example,  non-native
speakers  who  are  currently  working  within  it.  Other  suitable
informants  will  be  job  educators  or  trainers  and  other  experts
whose work requires them to have an articulated understanding
of the character and  demands of the setting. Methods used will
include questionnaires and interview. It may also be possible to
draw on  a  literature  analysing  the  characteristics  of the  com
municative  demands  of  the  setting;  this  is  true  in  the  area  of
medical communication, for example. When the job analysis has
been  completed,  test  materials  will  be  written  reflecting  the
domain,  and a panel
of
experts who know the nature
of
the
work
involved  may  be  asked  to  judge  their  relevance,  coverage,  and
authenticity.
Test method
The  next  thing  to  consider  in  test  design  is  the  way  in  which
candidates will be required to interact with the test materials, par
ticularly the response format, that is, the way in which the candi
date will be required to respond to the materials.  (The term test
method covers these aspects of design together with the issue of
how candidate responses will be rated or scored.) There are two
broad approaches to understanding the relation of test method to
test content. The  first sees method as  an  aspect of content,  and
raises  issues  of  authenticity;  the  second,  more  traditional
approach  treats  method  independently  of  content,  and  allows
more obviously inauthentic test response formats.
2 6
S U RV E Y

Authenticity of response
The job analysis discussed earlier will identify the range of com
municative roles  and tasks which characterize the criterion set
ting. This provides a basis not only for determining the kinds of
texts to be included in the test, but also how candidates will inter
act with them. We may attempt to reproduce, as far as is possible
in the test setting, the conditions under which they are processed
or produced in reality. In this way, test method involves simula
tion of the criterion as much as other aspects of the test materials.
The test method can itself become an aspect of relevant test con
tent,  if  we  define content not  only  in terms  of the texts to  be
included but how they are used.
However, such an approach raises the issue of authenticity of
response. There are competing imperatives. On the one hand it is
desirable to replicate, as far as is possible in the test setting, the
conditions under which engagement with communicative content
is done in the criterion setting, so that inferences from the test per
formance to likely future performance in the criterion can be as
direct as possible. On the other hand, it is necessary to have a pro
cedure that is fair to all candidates, and elicits a scorable perfor
mance, even if this means involving the candidates in somewhat
artificial  behaviour.  Once  again,  test  design  involves  a  sort  of
principled  compromise.  Let  us consider this issue firstly  in  the
context of assessing listening comprehension, and then in the con
text of the assessment of speaking.
In the development of a test of English as a foreign language for
international students, a job analysis may reveal that listening to
lectures is an important part of the candidates' future role as stu
dents, and so it makes sense to include listening to a lecture as part
of the test. But how should evidence of such comprehension abil
ity be sought? What form should the test task take? There are a
number of possibilities:
r
The task replicates what students have to do in the target situa
tion.  The  candidate  is  asked  to  take  notes,  which  are  then
scored  for  the  accuracy  of  their  content.  But  students  take
notes in very different ways, some of which may cause difficul
ties in scoring. For example, if a particular person fails to make
a note about a certain point, it may be because it has not been
T H E  T E S T I N G   C Y C L E
27

understood,  but  it  may  also  be  because  the  note-taker  has
determined that it does not merit a note.
2
If candidates are required  to  answer  pre-set  questions,  time
might be given for reading them prior to the listening, or candi
dates might simply be required to read as they listen. If prior,
either all the questions are presented at once, or a few are pre
sented at a time. But if the latter, how many? That is, how long
a chunk of lecture should the candidate have to process at any
one time?  Obviously, too long a stretch may introduce irrele
vant memory considerations, and the test becomes as much a
test  of  memory as  of listening  comprehension.  On the  other
hand, if too short, then the task of following extended stretches
of discourse is not represented in the test.
3
A candidate might be required to listen to the input just once,
or  more  than  once.  Obviously  repetition  is  unlike  the  real
world in the sense that lectures are not repeated. On the other
hand it may be argued that many students make a practice of
audio-taping lectures to  facilitate comprehension,  recall,  and
note-taking following the lecture.
All  decisions  about  test  method  in  such  a  context  inevitably
involve a compromise between the desirability of an appearance
of authenticity on
the
one hand
and the practicalities  imposed
by
the test situation on the other. Note that the way we resolve this
compromise  may  have  the  undesired  effect  of jeopardizing  the
fairness of the conclusions we reach about individual candidates.
For example, in the case where a candidate has been judged not to
meet  a  required  standard,  might  observation  of his/her  perfor
mance  under more natural conditions have led us to a different
conclusion ?  Methods  of  investigating  the  impact  of  decisions
about test method on the fairness of our judgements will be taken
up in Chapter
5
on validity.
In  relation to the assessment of speaking, related questions of
authenticity  arise.  Consider  the  case  of  immigrant  non-native
speaker  teachers  who  will  have  to  teach  their  subject  area
science  or  mathematics,  let  us  say-through  the  medium  of a
second language. Or consider teachers of foreign languages who
wish  to  conduct  their classes through the  medium  of the  target
language.  In order to assess whether teachers in each group are
28
S U RV E Y

communicatively competent to manage their classes in their sec
ond language, approaches ranging along a continuum of authen
ticity are conceivable.
Consider an aspect of classroom management, giving instruc
tions for an activity.  The most contrived, yet the most manage
able  from  a  test  administration  point  of  view,  is  to  give  a
paper-and-pencil  assessment  of  ability  to  handle  this  task,  for
example, by getting candidates to write out what they would say
in giving such instructions. This might be an adequate test of can
didates' control of the content of the instructions, but not give us
any evidence  about their ability to execute them, particularly in
the  context  of  interaction.  Alternatively,  we  might  attempt  to
simulate  the  task  with  a  role-play  simulation  of giving  instruc
tions for the setting up of a specific classroom activity, perhaps in
a  one-to-one setting with an examiner playing the role of a stu
dent being given the instructions. More realistically still, we may
require the  teacher to  teach  an  actual lesson, either with  a  spe
cially assembled group of students, or in the context of an actual
school  lesson.  Particular times and  occasions  of observation in
the actual work setting may be agreed upon, for example during
the  course  of school  practice  during  a  teacher training course.
Alternatively, non-native speaker teachers in some cases may be
given professional
accreditation,  and  the adequacy of their profi
ciency in the second language to handle the communicative tasks
of the work setting may be assessed over an extended period  as

tải về 2.79 Mb.

Chia sẻ với bạn bè của bạn:

1 2 3 4 5 6 7 8 9 ... 15

Oxford Introductions to Language Study Language Testing

part of language knowledge, yet does have an impact on perfor­

part of language knowledge, yet does have an impact on perfor