NCEA Report by Professor Paul Black 2000

Professor Paul Black of King's College, London, was invited to write a report on the proposals for the National Certificate of Educational Achievement in 2000. This page provides a summary and analysis of Professor Black's report, and the report itself is available for download below.

1 Reliability

The paper identifies that little had been written on the issue of reliability in the policy design. It points out that this is a particular issue in external examinations which, Black argues, are very likely to be far less trustworthy than most of the public believe. The paper further argues that the cut off point of most concern for credit purposes is the pass/fail boundary, and that assessments should be designed for maximum reliability at this boundary. To mediate this Professor Black argues for judgements to be made more directly on the evidence as a whole, rather than by aggregating separate judgements.

We accept there was limited reference made in the policy documents to issues of reliability, especially in relation to externally examined standards. This issue was also raised by Cedric Hall. This is something we will monitor in practice, although we note that this too is an issue that is neither new nor limited to this approach to assessment and certification.

The issues raised in respect of the discussion of grade boundaries have informed the policy regarding approaches taking to marking. Marking schedules do not, in most cases, call for points allocations, but rather decisions about whether or not the candidate's work has displayed sufficient evidence that the criterion has been met. This work itself, is not without contention because some critics wish to see whole subjects reported only in a single aggregated score. This is not to say that decisions at the grade boundaries are not unproblematic. They are. We accept the argument that the clearest grade distinction needs to be made at the no credit/credit boundary, as that is the point of "highest stakes" for most candidates. It would seem therefore that the use of multiple reporting formats

- individual standards

- grade point averages

- certification at 80 credits

will enable the users to determine the extent to which they pursue this issue.

In the current models we do not see the time (or degree of sampling) devoted to individual standards in the external examinations as being any more or less than that for the equivalent topics in a conventional examination. The reliability issues, at a standard or topic level, will be no better, but no worse, than before. Hall has argued that some research suggests that aggregation of marks across a number of topics to get a whole subject score, produces greater reliability than the marks for the individual questions. If this is true, then we accept that, until more work is done on reliability, reporting on the individual standards may be less reliable than current examination scores.

Towards the end of section 3.1 the paper notes that there is no work comparing reliability of external versus internal assessment. John Booreboom, in his doctoral thesis at University of Canterbury, investigated the assessment of senior secondary school physics and concluded that the reliability of internal assessment was a great as that of external. We accept that further work in this area will be essential.



Content last updated: 22 March 2010