part of an overall assessment of their readiness to practise their
profession.
Obviously, as assessment becomes more authentic, it also
becomes more expensive, complex, and potentially unwieldy. As
assessment becomes more thoroughly contextualized, it is clear
that a range of complex non-language-specific contextual vari
ables will become relevant in such assessments, of the kind dis
cussed in Chapter
2.
This raises the difficult questions of validity
discussed there, which will be considered further in Chapter
5 .
Fixed and constructed response formats
An alternative to grappling with the dilemma of authenticity of
response involves accepting to a greater or lesser degree the artifi
ciality of the test situation, and using a range of conventional and
T H E T E S T I N G C Y C LE
29
possibly inauthentic test formats. Of course, this forces us to face
the issue of the validity of the inferences we can make from per
formance within such formats.
Different response formats are sometimes conventionally asso
ciated with different types of test content. Tests of discrete points
of grammar for example, often use multiple choice question
(MCQ)
format (see Chapter r ) . MCQ is also commonly used in
vocabulary tests; alternatively, lists of vocabulary items and pos
sible definitions may be presented, with candidates having to
match items and definitions. Usually there are unequal numbers
of items and definitions, to prevent the last few matches becoming
predictable through a process of elimination. Tests of reading and
listening comprehension often use either one of the formats just
discussed, or true-false formats, in which the candidates have to
say whether a given proposition corresponds with one in the stim
ulus text or not. The propositions in the test question are based on
rewordings of propositions in the text, not direct lifting of words
or phrases from the text. Without paraphrase the task may
require nothing more than a literal matching of similar words or
phrases in the text and the question rather than an understanding
of the meaning of the propositions involved.
In this section we have so far considered fixed response for
mats, that is, ones in which the candidates' possible responses
have been anticipated and the candidate's task is to choose the
appropriate response from those offered.
Constructed response
formats
may also be used, although these are more complex and
usually more expensive to score. For example, in a doze test (see
Chapter
2
)
,
candidates are required to fill in the blanks in the pas
sage. In response to a stimulus comprehension passage, candi
dates may be asked to provide written or oral responses to short
answer questions,
in which they are responsible for the wording of
the answer. Constructed response formats have the advantage of
not constraining the candidate to the same degree, and reducing
the effect of guessing. The candidate assumes greater responsibil
ity for the response, and this may be perceived as in some ways
more demanding and more authentic. The disadvantage of such
response formats is that they are generally more expensive to
score. They cannot be marked automatically by machine; and
agreement among scorers on what constitutes an acceptable
3 0
S U RVEY
answer needs to be achieved. This may involve multiple marking
of scripts and discussion of discrepancies. Even in scoring perfor
mances on a doze test, decisions have to be made about the
acceptability of responses other than the exact word originally
deleted. Sometimes, another word may be equally acceptable in
the gap; that is, it is syntactically and semantically appropriate.
Determining the list of possible correct words and seeing that all
scorers are applying the scoring procedure correctly is time con
suming and therefore expensive. This takes us back to the issue of
constraints mentioned above.
In the testing of productive skills, a further range of test method
decisions need to be made, about the content and format of the
stimulus to writing or speaking (the prompt
)
,
the length and for
mat of the response, and about the scoring. For example, in writ
ing assessment, decisions will need to be made about such matters
as the number, length, and complexity of tasks set, the degree of
support in terms of content provided, the source of ideas for con
tent, where such support is provided, whether a choice of topic is
permitted, and the exact wording of the instructions to candi
dates (the rubric
)
.
In addition, procedures for scoring, particularly
the criteria against which the performance will be judged, need to
be developed. If performances are to be judged against rating
scales, then the scales need to be developed. What has been said
here about writing also applies to the assessment of speaking
skills, through interviews, role plays, group discussions, and
other procedures. The assessment of productive skills is consid
ered in detail in Chapter
4·
Of course, much of what has been said
in this paragraph applies as well to the design of relatively authen
tic tests of productive skills.
Test specifications
The result of the design process in terms of test content and test
method is the creation of test specifications. These are a set of
instructions for creating the test, written as if they are to be fol
lowed by someone other than the test developer; they are a recipe
or blueprint for test construction. Their function is to force explic
itness about the design decisions in the test and to allow new
versions to be written in future. The specifications will include
information on such matters as the length and structure of each
T H E T E S T I N G C Y C LE
3 1
Chia sẻ với bạn bè của bạn: |