Oxford Introductions to Language Study Language Testing

tải về 2.79 Mb.

Chế độ xem pdf

trang	1/15
Chuyển đổi dữ liệu	16.04.2022
Kích	2.79 Mb.
	#51661

1 2 3 4 5 6 7 8 9 ... 15

(by-Tim-McNamara)-Language-Testing

Oxford Introductions to Language Study
Language Testing
Tim McNamara is Associate Professor
in the Department of Linguistics and
Applied Linguistics at the University
of Melbourne.

Published in this series:
Rod Ellis:
Second Language Acquisition
Claire Kramsch:
Language and Culture
Thomas Scovel:
Psycholinguistics
Bernard Spolsky:
Sociolinguistics
H. G. Widdowson:
Linguistics
George Yule:
Pragmatics

Oxford Introductions to Language Study
Series Editor H.G. Widdowson
Tim McNamara
OXFORD
UNIVERSITY PRESS

OXFORD
UNIVERSITY  PRESS
Great Clarendon Street, Oxford ox2 6DP
Oxford University Press is a department of the University of Oxford.
It furthers the University's objective of excellence in research, scholarship,
and education by publishing worldwide in
Oxford  New York
Auckland  Cape Town  Dar es Salaam  Hong Kong  Karachi
Kuala Lumpur  Madrid  Melbourne  Mexico City  Nairobi
New Delhi  Shanghai  Taipei  Toronto
With offices in
Argentina  Austria  Brazil  Chile  Czech Republic  France  Greece
Guatemala  Hungary  Italy  Japan  Poland  Portugal  Singapore
South Korea  Switzerland  Thailand  1\n·key  Ukraine  Vietnam
OXFORD and OXFORD ENGLISH are registered trade marks of
Oxford University  Press in the UK and in certain other countries
© Oxford University  Press 2000
The moral rights of the  author have been asserted
Database  right Oxford University Press (maker)
First published 2000
2014 2013  2012  2011 2010
10  9  8  7
All rights reserved. No part of this publication may be reproduced,
stored in  a retrieval system, or transmitted, in any form or by any means,
without the prior permission in writing of Oxford University Press (with
the sole exception of photocopying carried out under the conditions stated
in the paragraph headed 'Photocopying'), or as expressly permitted by  law, or
under  terms  agreed  with the appropriate reprographics rights  organization.
Enquiries concerning reproduction outside the scope of the above should
be sent to the ELT Rights Department, Oxford University Press, at the
address above
You must not circulate this book in any other binding or cover
and you must impose this same condition on any acquirer
Photocopying
The Publisher grants permission for the photocopying of those pages
marked  'photocopiable' according to the following conditions. Individual
Pll!chasers may make copies for their own use or for use by classes that
they teach. School purchasers may make copies for use by staff and students,
but this permission does not extend to additional schools or branches
Under no circumstances may any part of this book be photocopied for resale
Any websites referred to in this publication are in the public domain and
their  addresses are provided by Oxford University Press for information only.
Oxford University Press disclaims any responsibility for the content
ISBN·1.3:  978 019 437222 0
Printed in China

To Terry Quinn

Contents
Preface
Author's preface
SECTION  I
Survey
I
1
Testing, testing . . . What is a language test?
3
Understanding language testing
4
Types of test
5
Test purpose
6
The criterion
7
The test-criterion relationship
IO
Conclusion
II
2
Communication and the design of language tests
l3
Discrete point tests
I3
Integrative and pragmatic tests
l4
Communicative language tests
I
6
Models of communicative ability
I7
Conclusion
2
I
3
The testing cycle
23
Understanding the constraints
24
Test content
2 5
Test method
26
Authenticity of response
27
Fixed and constructed response formats
29
Test specifications
3 I
Test trials
32
Conclusion
3 3

4
The rating process
3 5
Establishing a rating procedure
3 6
The problem with raters
3 7
Establishing a framework for making judgements
3 8
Rating scales
40
Holistic and analytic ratings
43
Rater training
44
Conclusion
44
5
Validity: testing the test
47
Threats to test validity
so
Test content
so
Test method and test construct
52
The impact of tests
5 3
Conclusion
54
6
Measurement
5 5
Introduction
5 5
Measurement
s 6
Quality control for raters
s 6
Investigating the properties of individual test items
59
Norm�referenced and criterion-referenced measurement
62
New approaches to measurement
64
Conclusion
65
7
The social character of language tests
67
Introduction
67
The institutional character of assessment
68
Assessment and social policy
68
Assessment and educational policy
69
The social responsibility of the language tester
70
Ethical language testing
72
Accountability
72
Wash back
73
Test impact
74
Codes of professional ethics for language testers
75
Critical language testing
76
Conclusion
77

8
New directions-and dilemmas?
79
Computers and language testing
79
Technology and the testing of speaking
8 r
Dilemmas: whose performance?
8 3
S ECT I O N
2
Readings
87
S ECT I O N
3
References
I2I
S ECT I O N   4
Glossary
131

Preface
Purpose
What justification might there be for a series of introductions to
language study?  After  all,  linguistics is already well served with
introductory texts: expositions and explanations which are com
prehensive, authoritative,  and excellent in their way.  Generally
speaking, however, their way is the  essentially  academic one  of
providing  a  detailed initiation into  the  discipline  of linguistics,
and they tend to be lengthy and technical: appropriately so, given
their purpose. But they can be quite daunting to the novice. There
is also a need for a more general and gradual introduction to lan
guage:  transitional texts which will  ease people  into  an under
standing of complex ideas. This series of introductions is designed
to serve this need.
Their purpose, therefore, is not to supplant but to support the
more  academically  oriented  introductions  to  linguistics:  to pre
pare the conceptual ground. They are based on the belief that it is
an  advantage  to  have  a  broad  map  of the  terrain  sketched  out
before one considers its more specific features on a smaller scale,
a general context in reference to which the detail makes sense. It is
sometimes the case that students are introduced to detail without
it being made clear what it is a detail of. Clearly, a general under
standing  of  ideas  is  not  sufficient:  there  needs  to  be  closer
scrutiny. But equally, close scrutiny can be myopic and meaning
less unless it is related to the larger view. Indeed it can be said that
the  precondition  of more  particular enquiry  is  an  awareness  of
what, in general, the particulars are about. This series is designed
to  provide  this  large-scale  view  of  different  areas  of  language
P R E F ACE
XI

study.  As  such  it  can  serve  as  preliminary to  (and precondition
for) the more specific and  specialized enquiry which students  of
linguistics are required to undertake.
But the  series  is  not only  intended  to  be helpful to  such  stu
dents. There are  many people who take  an interest in language
without  being  academically  engaged  in  linguistics  per  se.  Such
people may recognize the importance of understanding language
for their own lines of enquiry, or for their own practical purposes,
or quite simply for making them aware of something which fig
ures so centrally in their everyday lives. If linguistics has revealing
and relevant things  to  say  about  language,  this  should presum
ably not be a privileged revelation,  but one  accessible to people
other  than  linguists.  These  books  have  been  so  designed as  to
accommodate  these  broader  interests  too:  they  are meant to  be
introductions to language more generally as well as to linguistics
as a discipline.
Design
The books in the series are all cut to the same basic pattern. There
are four parts: Survey, Readings, References, and Glossary.
Survey
This  is a  summary  overview  of the  main  features  of the  area  of
language study concerned: its scope and principles of enquiry, its
basic  concerns  and  key  concepts.  These  are  expressed  and
explained in ways which are intended to make them as accessible
as possible to people who have no prior knowledge or expertise in
the subject. The Survey  is written to  be readable  and is unclut
tered by the customary scholarly references. In this sense, it is sim
ple. But it is not simplistic. Lack of specialist expertise does not
imply  an  inability  to  understand  or  evaluate  ideas.  Ignorance
means  lack  of knowledge,  not lack  of intelligence.  The  Survey,
therefore, is meant to be challenging.  It draws a map of the sub
ject area in such a way as to stimulate thought and to invite a crit
ical  participation  in  the  exploration  of  ideas.  This  kind  of
conceptual cartography has its dangers of course: the selection of
what is significant, and the manner of its representation, will not
be to the liking of everybody, particularly not, perhaps, to some
XII
P R E F ACE

of those inside the discipline. But these surveys are written in the
belief that there must be an alternative to a technical account on
the one hand, and an idiot's guide on the other if linguistics is to
be made relevant to people in the wider world.
Readings
Some  people  will  be  content to  read,  and  perhaps  re-read,  the
summary Survey.  Others will want to pursue the subject  and so
will use the Survey as the preliminary for more detailed study. The
Readings  provide the necessary transition. For here the reader is
presented with texts  extracted from the specialist literature. The
purpose of these Readings is quite different from the Survey. It is
to get readers to focus on the specifics of what is said, and how it
is  said,  in these source texts.  Questions  are  provided to further
this  purpose:  they  are  designed  to  direct  attention  to  points  in
each text, how they compare across texts, and how they deal with
the issues discussed in the  Survey. The idea is to give readers an
initial familiarity with the more specialist idiom of the linguistics
literature, where the issues might not be so readily accessible, and
to encourage them into close critical reading.
References
One  way
of  moving  into  more  detailed  study  is  through
the
Readings.  Another  is  through  the  annotated  References  in  the
third  section  of each  book.  Here  there  is  a  selection  of works
(books  and  articles)  for  further  reading.  Accompanying  com
ments indicate how these deal in more detail with the issues  dis
cussed in the different chapters of the Survey.
Glossary
Certain terms in the Survey appear in bold. These are terms used
in a special or technical sense in the discipline. Their meanings are
made clear in the  discussion,  but they are  also explained in the
Glossary  at the  end  of each  book.  The  Glossary  is  cross-refer
enced to the  Survey,  and therefore  serves  at the same time as  an
index. This enables readers to locate the term and what it signifies
in the more general discussion, thereby, in effect, using the Survey
as a summary work of reference.
P R E F ACE
XIII

Use
The series has been designed so as to be flexible in use. Each title is
separate  and  self-contained, with only the  basic format  in  com
mon. The four sections of the format, as described here, can be
drawn upon and combined in different ways, as required  by the
needs, or interests, of different readers. Some may be content with
the Survey and the  Glossary  and may not want to follow up the
suggested  References.  Some  may  not  wish  to  venture  into  the
Readings.  Again, the Survey might be considered as appropriate
preliminary reading for a course in applied linguistics or teacher
education,  and the  Readings more appropriate for seminar dis
cussion during the course. In short, the notion of an introduction
will mean different things to different people,  but in all cases the
concern is to provide access to specialist knowledge and stimulate
an awareness  of its  significance.  This series as a whole has been
designed  to  provide  this  access  and  promote  this  awareness  in
respect to different areas of language study.
H .  G .  W I D D O W S O N
Author's acknowledgements
Language  testing is  often  thought  of as  an  arcane  and  difficult
field,  and politically  incorrect to  boot.  The  opportunity to  pro
vide an introduction to the conceptual interest of the field and to
some of its procedures has been an exciting one. The immediate
genesis  for  this  book  came  from  an  invitation  from  Henry
Widdowson,  who  proved  to  be  an illuminating  and  supportive
editor  throughout  the  process  of the  book's writing.  It  was  an
honour and a pleasure to work with him.
The  real  origins  of the  book  lay further  back, when over  15
years ago Terry Quinn of the University of Melbourne urged me
to  take  up  a  consultancy on  language  testing  at the  Australian
Language  Centre in Jakarta.  Terry has  been an  invaluable  sup
port  and  mentor  throughout  my  career  in  applied  linguistics,
nowhere more so than in the field of language testing, which in his
usual clear-sighted way he has always understood as being inher
ently political and  social in character,  a perspective which  I  am
XIV
P R E F ACE

only now, after twelve years of research in the area, beginning to
properly  understand.  I  am  also  grateful  to  my  other  principal
teachers about language testing, Alan Davies, Lyle Bachman, and
Bernard Spolsky, and to my friend and colleague Elana Shohamy,
from whom I have learnt so much in conversations long into the
night  about  these  and  other  matters.  I  also  owe  a  deep  debt  to
Sally Jacoby,  a  challenging  thinker  and  great  teacher,  who  has
helped me frame and contextualize in new ways my work in this
field.  My  colleagues  at  Melbourne,  Brian  Lynch  and  Alastair
Pennycook, have dragged me kicking and screaming at least some
way  into  the  postmodern  era.  The  Language  Testing  Research
Centre at the University of Melbourne has been for over a decade
the perfect environment within which thinking on language test
ing can flourish, and
I
am grateful to (again) Alan Davies and to
Cathie Elder, and to all my other colleagues there. Whatever clar
ity the  book may have is principally  due  to my dear friend and
soulmate Lillian Nativ, who remains the most difficult and criti
cal student I have had. Being a wonderful teacher herself she will
never accept anything less than clear explanations. The students
to whom
I
have taught language testing or whose research I have
supervised over the years have also shaped this book in consider
able ways. At OUP,
I
have had excellent help from Julia Sallabank
and Belinda Penn.
On a more personal note
I
am grateful for the continuing sup
port and friendship of Marie-Therese Jensen and the love of our
son Daniel.
T I M   M cNAMARA
P R E F A C E
XV

S EC T I O N
I
Survey

Testing,  testing ...
What  is a language test?
Testing  is  a  universal  feature  of  social  life.  Throughout  history
people  have  been  put  to  the  test to prove  their capabilities  or  to
establish  their  credentials;  this  is  the  stuff  of  Homeric  epic,  of
Arthurian legend.  In modern societies such tests have proliferated
rapidly. Testing for purposes of detection or to establish identity has
become  an  accepted  part  of sport  (drugs  testing), the  law  (DNA
tests, paternity tests, lie detection tests), medicine (blood tests, can
cer screening tests, hearing, and eye tests), and other fields. Tests to
see how a person performs particularly in relation to a threshold of
performance have become important social institutions and fulfil a
gatekeeping function in that they control entry to many important
social roles. These include the driving test and a range
of
tests in edu
cation and the  workplace.  Given the centrality of testing in so<;:ial
life, it is perhaps surprising that its practice is so little understood. In
fact, as so often happens in the modern world, this process, which so
much  affects  our lives, becomes the  province  of  experts  and we
become dependent on them. The expertise of those involved in test
ing is seen as remote and obscure, and the tests they produce are typ
ically associated in us with feelings of anxiety and powerlessness.
What is true of testing in general is true also of language testing,
not a topic likely to quicken the pulse or excite much immediate
interest. If it evokes any reaction, it will probably take the form of
negative associations. For many, language tests may conjure up
an image of an  examination room, a  test paper with  questions,
desperate  scribbling  against  the  clock.  Or  a  chair  outside  the
interview  room  and  a  nervous  victim  waiting  with  rehearsed
phrases  to  be called into  an  inquisitional  conversation with the
examiners. But there is more to language testing than this.
T E S T I N G ,   T E S T I N G
3

To begin with, the very nature of testing has changed quite rad
ically over the years to become less impositional, more humanis
tic, conceived not so much to catch people out on what they do
not  know,  but  as  a  more neutral  assessment  of what they  do.
Newer forms of language assessment may no longer involve the
ordeal  of  a  single  test  performance  under  time  constraints.
Learners may  be  required to build up  a  portfolio of written or
recorded  oral  performances  for  assessment.  They  may  be
observed in their normal  activities of communication in the lan
guage  classroom  on  routine  pedagogical  tasks.  They  may  be
asked to carry out  activities outside the  classroom context  and
provide evidence of their performance.  Pairs of learners may be
asked to take part in role plays or in group discussions as part of
oral assessment. Tests may be delivered by computer, which may
tailor the form of the test to the particular abilities of individual
candidates. Learners may be encouraged to assess aspects of their
�wn abilities.
Clearly these assessment activities  are very different from the
solitary  confinement  and  interrogation  associated  with  tradi
tional testing. The question arises, of course, as to how these dif
ferent  activities  have  developed,  and  what  their  principles  of
design might be. It is  the  purpose  of this  book to  address these
questions.
Understanding language testing
There are many reasons for developing a critical understanding of
the  principles  and  practice  of  language  assessment.  Obviously
you will need to do so if you are actually responsible for language
test development and claim expertise in this field. But many other
people working in the field of language study more generally will
want to be able to participate as necessary in the discourse of this
field, for a number of reasons.
First, language tests play a powerful role in many people's lives,
acting as gateways at important transitional moments in educa
tion, in employment, and in moving from one country to another.
Since  language  tests  are  devices  for the  institutional  control  of
individuals,  it  is  clearly  important  that  they  should  be  under
stood, and subjected to scrutiny.  Secondly, you may be working
with  language  tests  in  your  professional  life  as  a  teacher  or
4
S U RV E Y

administrator, teaching to  a  test,  administering tests,  or  relying
on information from tests to make decisions on the placement of
students on particular courses.
Finally, if you are conducting research in language study you
may need to have measures of the language proficiency of your
subjects. For this you need either to choose an appropriate exist
ing language test or design your own.
Thus, an understanding of language testing is relevant both for
those actually involved in creating language tests, and also more
generally for those involved in using tests or the information they
provide, in practical and research contexts.
Types of test
Not  all  language  tests  are  of the  same  kind.  They  differ  with
respect to how they are designed, and what they are for: in other
words, in respect to test
method
and test
purpose.
In  terms  of  method,  we  can  broadly  distinguish  traditional
paper-and-pencil language tests
from performance tests. Paper-and
pencil tests  take the  form  of the  familiar examination  question
paper.  They are typically used for the assessment either of sepa
rate components  of language knowledge  (grammar,  vocabulary
etc.) or of receptive understanding (listening and reading compre
hension). Test items in such tests, particularly if they are profes
sionally made  standardized  tests, will  often  be in fixed response
format
in wh
i
c
h
a
number of possible responses is presented from
which the candidate is required to choose. There are several types
of fixed response format, of which the most important is multiple
choice format,
as in the following example from a vocabulary test:
Select the most appropriate completion of the sentence.
I
wonder what the newspaper says about the new play.
I
must
read the
(a)
criticism
(b)
opinion
':·
(c)
review
(d)
critic
Items  in  multiple  choice  format  present  a  range  of  anticipated
likely responses to the test-taker. Only one of the presented alter
natives  (the
key,
marked  here  with  an  asterisk)  is  correct;  the
T E S T I N G ,   T E S T I N G
5

others (the
distractors)
are based on typical confusions or misun
derstandings seen  in  learners'  attempts to answer the  questions
freely in try-outs of the test material, or on observation of errors
made in the process of learning more generally. The candidate's
task  is  simply to  choose  the  best  alternative  among  those  pre
sented.  Scoring then follows automatically, and is  indeed  often
done by machine. Such tests are thus efficient to administer and
score, but since they only require picking out one item from a set
of given alternatives, they are not much use in testing the produc
tive skills of speaking and writing, except indirectly.
In performance based tests,  language skills are  assessed in an
act  of  communication.  Performance  tests  are  most  commonly
tests of speaking and writing,  in which a  more  or less extended
sample  of speech  or writing  is  elicited  from  the  test-taker,  and
judged by one or more trained raters using an agreed rating proce
dure.
These  samples are elicited in the context of simulations of
real-world tasks in realistic contexts.
Test purpose
Language tests also differ according to their
purpose.
In fact, the
same form of test may be used for differing purposes, although in
other cases the purpose may affect the  form.  The most familiar
distinction  in  terms  of test purpose is
that between achievement
and proficiency tests.
Achievement  tests
are  associated  with  the  process  of mstruc
tion.  Examples  would  be:  end  of course  tests,  portfolio  assess
ments, or observational procedures for recording progress on the
basis  of  classroom  work  and  participation.  Achievement  tests
accumulate evidence during, or at the end of, a course of study in
order to see whether and where progress has been made in terms
of the  goals  of  learning.  Achievement  tests  should  support  the
teaching to which they relate. Writers have been critical of the use
of multiple choice standardized tests for this purpose, saying that
they have a negative effect on classrooms as teachers teach to the
test, and that there is often a mismatch between the test and the
curriculum,  for  example  where  the  latter  emphasizes  perfor
mance. An achievement test may be self-enclosed in the sense that
it may not  bear  any  direct relationship  to  language  use  in  the
world outside the classroom {it may focus on knowledge of par-
6
S U RV E Y

ticular points of grammar or vocabulary, for example). This will
not be the case if the syllabus is itself concerned with the outside
world, as the test will then automatically reflect that reality in the
process  of  reflecting  the  syllabus.  More  commonly  though,
achievement tests are  more easily able to  be innovative,  and to
reflect progressive aspects of the curriculum, and are associated
with some of the most interesting new developments in language
assessment  in  the  movement  known  as  alternative  assessment.
This  approach stresses the  need  for assessment to  be  integrated
with the goals of the curriculum and to have a constructive rela
tionship with teaching and learning.  Standardized tests are seen
as too often having a negative, restricting influence on progressive
teaching.  Instead,  for  example,  learners  may  be  encouraged  to
share in the responsibility for assessment, and be trained to evalu
ate their own capacities in performance in a range of settings in a
process known as self-assessment.
Whereas achievement tests relate to the past in that they mea
sure what language the students have learned as a result of teach
ing,
proficiency tests
look to the future situation of language use
without  necessarily  any  reference  to  the  previous  process  of
teaching. The future 'real life' language use is referred to as the cri
terion.
In  recent  years  tests  have  increasingly  sought  to  include
performance  features  in  their
d
esign
, w
hereby characteristics
of
the  criterion  setting are  represented.  For example,  a  test of the
communicative abilities of health professionals in work settings
will be based on representations of such workplace tasks as com
municating with patients or other health  professionals.  Courses
of study to prepare candidates for the test may grow  up  in  the
wake of its establishment, particularly if it has an important gate
keeping function,  for  example  admission to an overseas univer
sity,  or  to  an  occupation  requiring  practical  second  language
skills.
The criterion
Testing  is  about  making  inferences;  this  essential  point  is
obscured by the fact that some testing procedures, particularly in
performance assessment,  appear  to  involve  direct  observation.
Even where the  test simulates real world  behaviour-reading  a
newspaper, role playing a conversation with a patient, listening to
T E S TING,  T E S T ING
7

a  lecture-test  performances  are  not  valued  in  themselves,  but
only  as  indicators of how  a  person  would  perform  similar,  or
related, tasks in the real world setting of interest. Understanding
testing  involves  recognizing  a  distinction  between  the
criterion
(relevant communicative  behaviour  in the  target  situation)  and
the
test.
The  distinction  between test and criterion is set out for
performance-based tests in Figure r.r
Test
A performance
or series of
performances,
simulating/
representing or
sampled from
the criterion
(observed)
Characterization
of the essential
features of the
riterion influence
the design of
test
c
inferences about
F I G u R E   r. r
Test and criterion
Criterion
A series of
s
performances
subsequent to
the test; the
target
(unobservable)
Test  performances  are  used  as  the  basis  for making  inferences
about  criterion  performances.  Thus, for example, listening to  a
lecture in a test is used to infer how a person would cope with lis
tening to lectures in the course  of study he/she is aiming to enter.
It is important to stress that although this criterion behaviour, as
relevant  to  the  appropriate  communicative  role  (as  nurse,  for
example,  or  student), is the  real  object  of interest,  it cannot  be
accounted for as such by the test. It remains elusive since it cannot
be directly observed.
There has been a resistance among some proponents of direct
testing
to this  idea. Surely test tasks can  be authentic samples of
behaviour?  Sometimes  it  is  true  that  the  materials  and  tasks  in
language tests can be relatively realistic but they can never be
real.
For example, an oral examination might include a conversation,
or  a  role-play  appropriate  to the  target  destination.  In a  test  of
English for immigrant health professionals, this might be between
a doctor and a patient. But even where performance test materials
appear  to  be  very  realistic  compared  to  traditional  paper-and-
8
S U RV E Y

pencil tests, it is clear that the test performance does not exist for
its own  sake. The test-taker is  not  really reading the newspaper
provided in the test for the specific information within it; the test
taking doctor  is  not really  advising  the  'patient'.  As  one  writer
famously  put  it,  everyone  is aware  that  in  a  conversation  used
to assess oral ability 'this is a test, not a tea party'. The effect of
test method on the realism of tests will be  discussed  further  in
Chapter
3·
There are a number of other limits to the authenticity of tests,
which force us to recognize an inevitable gap between the test and
the criterion. For one thing, even in those forms of direct perfor
mance  assessment  where  the  period  in  which  behaviour  is
observed is quite extended (for example, a teacher's ability to use
the target language in class may be observed on a series of lessons
with real students), there comes a point at which we have to stop
observing  and reach  our  decision  about  the  candidate-that  is,
make an inference  about  the candidate's  probable  behaviour in
situations subsequent to the assessment period. While it may be
likely that our conclusions based on the assessed lessons may be
valid in relation to the subsequent unobserved teaching,  differ
ences  in  the  conditions  of performance  may  in  fact  jeopardize
their validity (their generalizability). For example, factors such as
the careful  preparation  of lessons when the teacher was under
observation may not be replicated in the criterion, and the effect
of this cartrtot
be known in advance. The point is that observation
of behaviour as part of the activity of assessment is naturally self
limiting, on logistical  grounds  if for no other reason.  In fact,  of
course, most test situations allow only a very brief period of sam
pling of candidate behaviour-usually a couple of hours or so at
most; oral tests may last only a few minutes. Another constraint
on direct knowledge of the criterion is the testing equivalent of the
Observer's  Paradox:  that  is,  the  very  act  of  observation  may
change the  behaviour  being observed.  We  all know how  tense
being  assessed  can  make  us,  and conversely  how easy  it  some
times is to play to the camera, or the gallery.
In judging test performances then, we are not interested in the
observed instances of actual use for their own sake; if we were,
and  that  is  all  we  were  interested  in,  the  sample  performance
would not be a test. Rather, we want to know what the particular
TESTING,  TESTING  9

performance reveals of the potential for subsequent performances
in  the  criterion  situation.  We  look  so  to  speak  underneath  or
through  the  test  performance  to  those  qualities  in  it which  are
indicative of what is held to underlie it.
If  our  inferences  about  subsequent  candidate  behaviour  are
wrong, this may have serious consequences for the candidate and
others who have a stake in the decision. Investigating the defensi
bility of the inferences about candidates that have been made on
the basis of test performance is known as test validation, and is the
main focus of testing research.
The test-criterion relationship
The very practical activity of testing is inevitably underpinned by
theoretical understanding of the relationship between the  crite
rion  and  test  performance.  Tests  are  based  on  theories  of  the
nature of language use in the target setting and the way in which
this  is  understood  will  be  reflected  in  test  design.  Theories  of
language  and language in  use have of course  developed  in very
different directions over the years and tests will reflect a variety of
theoretical orientations. For example, approaches which see per
formance in the criterion as an essentially cognitive activity will
understand language use in terms of cognitive constructs such as
knowledge,  ability,  and  proficiency.  On  the  other  hand,  ap
proaches which conceive of criterion performance as a social and
interactional achievement will emphasize social roles and interac
tion in test design. This will be explored in detail in Chapter 2.
However, it is not enough simply to accept the proposed rela
tionship  between  criterion  and  test  implicit  in  all  test  design.
Testers need to check the empirical evidence for their position in
the light of candidates' actual performance on test tasks. In other
words, analysis of test data is called for, to put the theory of the
test-criterion relationship itself to the test.  For example, current
models  of  communicative  ability  state  that  there  are  distinct
aspects  of that ability, which should be measured  in tests.  As  a
result,  raters of speaking skills are sometimes required to fill in a
grid where they record separate impressions of aspects of speak
ing  such  as  pronunciation,  appropriateness, grammatical  accu
racy,  and  the  like.  Using  data  (test  scores)  produced  by  such
procedures, we  will  be in a position to examine empirically the
IO
S U RVEY

relationship  between  scores  given  under  the  various categories.
Are  the  categories  indeed  independent?  Test  validation  thus
involves two things.  In the first  place,  it involves understanding
how,  in  principle,  performance  on the  test can  be  used  to infer
performance in the criterion. In the second place, it involves using
empirical data from test performances to investigate the defensi
bility of that understanding and hence of the interpretations (the
judgements about test-takers) that follow from it. These matters
will be considered in detail in Chapter
5,
on test validity.
Conclusion
In this chapter we have looked at the nature of the test-criterion
relationship. We have seen that a language test is a procedure for
gathering evidence  of general or specific  language  abilities from
performance on tasks designed to provide a basis for predictions
about an individual's use of those abilities in real world contexts.
All such tests require us to make a distinction between the
data
of
the learner's  behaviour, the actual language that is produced in
test performance, and what these data signify, that is to say what
they count as in terms of
evidence
of 'proficiency', 'readiness for
communicative roles in the real world', and so on. Testing thus
necessarily involves interpretation of the data of test performance
as
evidence  of knowledge  or  ability of one  kind or another.  Like
the  soothsayers  of ancient  Rome,  who  inspected  the  entrails  of
slain  animals  in order  to  make their  interpretations  and  subse
quent predictions of future events, testers need specialized knowl
edge of what signs to look for, and a theory of the relationship of
those signs to events in the world. While language testing resem
bles other kinds of testing in that it conforms to general principles
and practices of measurement, as other areas of testing do, it is
distinctive in that the signs and evidence it deals with have to do
specifically with language. We  need then to  consider how views
about the nature of language have had an impact on test design.
T E S T I N G ,   TEST I N G
II

Communication  and the design
of language tests
Essential to the  activities  of designing tests and  interpreting the
meaning of test scores is the view of language  and language use
embodied  in  the  test.  The  term  test  construct  refers  to  those
aspects of knowledge  or skill possessed  by the  candidate  which
are being measured. Although this term is taken from psychology,
we  should note that the knowledge  or skill  being assessed  does
not have to be defined in psychological terms. Thus some scholars
have taken  a  social rather  than  psychological  view  of language
performance  and  would  define  the  test  construct  accordingly.
Defining  the  test  construct  involves  being  clear  about  what
knowledge of language consists  of,  and how that knowledge is
deployed  in  actual  performance  (language  use).  Understanding
what view the test takes of language use in the criterion is neces
sary for  determining the  link  between  test  and criterion  in per
formance  testing.  This  is  not  just  an  academic  matter.  It  has
important practical implications, because according to what view
the test takes, the 'look' of the test will be different, reporting of
scores will change, and test performance will  be interpreted dif
ferently. The difference of format between paper-and-pencil tests
and performance tests is not just incidental; it reflects an implicit
difference between views of language and language use.
Discrete point tests
Early  theories  of test  performance,  influenced  by  structuralist
linguistics, saw knowledge  of language as consisting of mastery
of the  features  of the  language  as  a  system.  This  position  was
clearly articulated by Robert Lado in his highly influential book
Language  Testing,
published  in  r 9 6 1 .   Testing  focused  on
C O M M U N I C A T I O N  AND  T H E   D E S I G N  O F   LANGU A G E   T E S T S
I 3

candidates' knowledge of the grammatical system, of vocabulary,
and of aspects of pronunciation. There was a tendency to atomize
and  decontextualize  the  knowledge  to  be  tested,  and  to  test
aspects of knowledge in isolation. Thus, the points of grammar
chosen for assessment would be tested one at a time; and tests of
grammar would be separate from tests of vocabulary. Material to
be tested was presented with minimal context, for example in an
isolated  sentence.  This  practice  of  testing  separate,  individual
points  of knowledge,  known  as discrete  point testing, was  rein
forced by theory and practice within psychometrics, the emerging
science of the measurement of cognitive abilities. This stressed the
need for certain properties of measurement, particularly reliabil
ity,  or consistency  of estimation of candidates'  abilities.  It was
found that this could be best achieved through constructing a test
consisting of many small items all directed at the same general tar
get-say,  grammatical  structure,  or  vocabulary  knowledge.  In
order to test these individual points, item formats of the multiple
choi<;:e  question type  were  most  suitable.  While  there  was  also
realization among some writers that the integrated nature of per
formance needed to be reflected somewhere in a test battery, the
usual  way  of handling  this  integration was  at the  level  of skills
testing,
so that the four language macroskills of listening, reading,
writing, and speaking were  in  various  degrees tested  (again,  in
strict  isolation  from  one  another)  as  a  supplement  to  discrete
point tests.  This  period  of language  testing  has  been  called  the
psychometric-structuralist  period  and  was  in  its  heyday  in  the
r96os;  but  the  practices  adopted  at  that  time  have  remained
hugely influential.
Integrative and pragmatic tests
Within a decade, the necessity of assessing the practical language
skills of foreign students wishing to study at universities in Britain
and  the  US,  together  with  the  need  within  the  communicative
movement  in  teaching  for  tests  which  measured  productive
capacities for language, led to a demand for language tests which
involved  an integrated performance on the part of the language
user. The discrete point tradition of testing was seen as focusing
too exclusively on knowledge of the formal linguistic system for
its own  sake rather  than  on the way such knowledge is used to
14
S U RV E Y

achieve  communication.  The  new  orientation  resulted  in  the
development of tests which integrated knowledge of relevant sys
tematic  features  of language  (pronunciation, grammar, vocabu
lary) with an understanding of context. As a result, a distinction
was drawn between discrete point tests and integrative tests such
as speaking  in  oral interviews,  the  composing  of whole written
texts,  and  tests  involving comprehension  of extended  discourse
(both spoken and written). The problem was that such integrative
tests tended to be expensive, as they were time consuming and dif
ficult  to  score,  requiring  trained  raters;  and  in  any  case  were
potentially  unreliable  (that  is,  where  judges  were  involved,  the
judges would disagree).
Research carried out by the American, John Oller, in the 1 970s
seemed to offer a solution. Oller offered a new view of language
and language use underpinning tests, focusing less on knowledge
of language and more on the psycholinguistic processing involved
in language use. Language use was seen as involving two factors:
( r )  the on-line processing of language in real time (for example, in
naturalistic  speaking  and  listening  activities),  and  ( 2)  a  'prag
matic mapping' component, that is, the way formal knowledge of
the systematic features of language was drawn on for the expres
sion and understanding of meaning in context. A test of language
use had to involve both of these features, neither of which was felt
to be captured in the  discrete  point tradition of testing.  Further,
Oller proposed what came to be known as the Unitary Competence
Hypothesis,
that is,  that performance  on  a  whole  range  of tests
(which he termed pragmatic tests) depended on the same underly
ing capacity in the learner-the ability to integrate grammatical,
lexical,  contextual,  and  pragmatic  knowledge  in  test  perfor
mance.  He  argued  that  certain  kinds  of more  economical  and
efficient  tests,  particularly  the  cloze  test  (a  gap-filling  reading
test), measured the same kinds of skills as those tested in produc
tive tests of the types listed above.  It was argued that a doze test
was  an  appropriate  substitute  for  a  test  of  productive  skills
because it required readers to integrate grammatical, lexical, con
textual, and pragmatic knowledge in order to be able to supply the
missing words. A doze test was a reading test, consisting of a text of
approximately 400 words in length. After an introductory sentence
or two which was left intact, words were systematically removed-
C O M M U N I CATI O N   A N D   T H E   D E S I G N   O F   L A N G U A G E  T E S T S
1 5

every 5th, 6th or 7th word was a typical procedure-and replaced
with  a  blank.  The  task was for  the  reader  to  supply  the  missing
word.  Various  scoring  methods  (exact  word  replacement,  any
acceptable word replacement) were tried out and seemed to pro
vide  much  the  same  information  about  the relative  abilities  of
readers. Such tests were easy to construct, relatively easy to score,
were based on a compelling theory of language use, and seemed
an attractive alternative to more elaborate and expensive tests of
the  productive  skills  of  speaking  and  writing.  The  doze  thus
became a very popular form of test in the 1 970s and early 198os
(and is still widely used today).
Unfortunately,  further work soon showed that doze tests on
the whole seemed mostly to be measuring the same kinds of things
as discrete point tests of grammar and vocabulary. It seems that
there are no short cuts in the testing of communicative skills.
Communicative language tests
From the early 1970s, a new theory of language and language use
began to exert a  significant influence  on language teaching  and
potentially on language testing. This was Hymes's theory of com
municative  competence,  which  greatly  expanded  the  scope  of
what was covered by an understanding of language and the abil
ity to use language in context, particularly in terms of the social
demands  of performance.  Hymes  saw that knowing  a language
was more than knowing its rules of grammar. There were cultur
ally specific rules of use which related the language used to fea
tures  of  the  communicative  context.  For  example,  ways  of
speaking  or  writing  appropriate  to  communication  with  close
friends may not be the same as those used in communicating with
strangers, or in professional contexts. Although the relevance of
Hymes's theory to  language testing was recognized more or less
immediately  on  its  appearance,  it  took  a  decade  for  its  actual
impact on practice to be felt, in the development of communica
tive  language  tests.  Communicative  language  tests  ultimately
came to have two features:
r
They were performance tests, requiring assessment to be carried
out when the learner or candidate was engaged in an extended
act of communication, either receptive or productive, or both.
1 6
S U RVEY

2  They paid attention to the social roles candidates were likely to
assume in real world settings, and offered a means of specifying
the demands of such roles in detail.
The  second  of these  features  distinguishes  communicative  lan
guage tests from the integrative/pragmatic testing tradition. The
theory  of  communicative  competence  represented  a  profound
shift from a  psychological  perspective  on  language,  which  sees
language as an internal phenomenon, to a sociological one, focus
ing on the external, social functions of language.
Developments  in  Britain  were  particularly  significant.  The
Royal  Society  of  Arts  developed  influential  examinations  in
English as a  Foreign  Language with innovative features such  as
the  use  of authentic texts  and real world  tasks;  and  the  British
Council and other authorities developed communicative tests of
English as a Foreign Language for overseas students intending to
study  at  British  universities.  These  latter  tests  in  some  cases
involved careful study of the communicative roles and tasks fac
ing such students in Britain as the basis for test design; this stage
of the process is known as a job analysis. This approach has con
tinued to be used in the development of tests in occupational set
tings.  For  example,  in  the  development  of an Australian  test  of
English  as  a  second  language  for  health  professionals,  those
familiar  with  clinical  situations  in  hospital  settings  were  sur
veyed, and tasks such as communicating with patients, presenting
cases to colleagues, and so on were identified and ranked accord
ing to criteria such as complexity, frequency, and importance as
the basis for subsequent test task design. Test materials were then
developed to simulate such roles and tasks where possible.
Models of communicative ability
The practical and imaginative response  to the challenge of com
municative language testing was matched by a continuing theo
retical  engagement with the idea of communicative competence
and  its  implications  for  the  performance  requirement  of  com
municative language testing. Various writers have tried to specify
the  components  of  communicative  competence  in  second  lan
guages and their role in performance. This has been done in order
to provide a comprehensive framework for test development and
C O M M U N I CATI O N  A N D   T H E   D E SI G N   O F   L A N G U A G E  T E S T S
17

testing research, and a basis for the interpretation of test perfor
mance.
In their  first  form,  such  models  specified the  components  of
knowledge of language without dealing in detail with their role in
performance. Various aspects of knowledge or competence were
specified in the early r9 8os by Michael Canale and Merrill Swain
in Canada:
r
grammatical
or  formal  competence,  which covered the  kind
of knowledge  (of  systematic  features  of grammar,  lexis,  and
phonology)  familiar  from  the  discrete  point  tradition  of
testing;
2
sociolinguistic
competence, or knowledge of rules of language
use in terms of what is appropriate to different types of inter
locutors, in different settings, and on different topics;
3
strategic
competence,  or the  ability to compensate in perfor
mance for incomplete or imperfect linguistic resources in a sec
ond language; and
4
discourse
competence, or the ability to deal with extended use
of language in context.
Note that strategic competence is oddly named as it is not a type
of  stored  knowledge,  as  the  first  two  aspects  of  competence
appear  to  be,  but  a  capacity  for  strategic  behaviour  in  perfor
mance,  which  is  likely  to  involve  non-cognitive  issues  such  as
confidence, preparedness to take risks, and so on. Discourse com
petence similarly has elements of a general intellectual flexibility
in  negotiating  meaning  in  discourse,  in  addition  to  a  stored
knowledge aspect-in this case,  knowledge of the way in which
links  between different sentences or ideas  in a text are explicitly
marked,  through  the  use  of  pronouns,  conjunctions,  and  the
like.
Further years  of discussion  and reflection  on this  framework
have led to its more detailed reformulation. There  has,  to  begin
with,  been  a  further  specification  of  different  components  of
knowledge  that would  appear  to be  included  in communicative
competence. Thus Lyle Bachman, for example, has identified sub
categories of knowledge within the broader categories of gram
matical, discourse, and sociolinguistic competencies. At the same
1 8
S U RV E Y

time, strategic competence no longer features as a component of
such  knowledge.  In  fact,  the  notion  of  strategic  competence
remains crucial in understanding second language performance,
but it has been reconceptualized. Instead of referring to a  com
pensatory strategy for learners, it is seen as a more general phe
nomenon of language  use.  In this view,  strategic competence  is
understood  as a  general  reasoning  ability  which  enables one to
negotiate meaning in context.
This reworking of the idea of strategic competence has impor
tant implications for assessment. If strategic competence is  not

tải về 2.79 Mb.

Chia sẻ với bạn bè của bạn:

1 2 3 4 5 6 7 8 9 ... 15