The University of Durham’s Centre for Evaluation and Monitoring (CEM) tests are being used to decide who gets a place at an increasing number of our state funded grammar schools. Given the increased importance these tests are playing in the conduct of public affairs we were rather disappointed with CEM’s response to our attempts to find out a little more about them. In a recent discussion we had with Richard Adams from the Guardian he described CEM as “the good guys” and on the whole we’d agree. They produced the report Evidence on the effects of selective educational systems on behalf of the Sutton Education Trust which is an essential read for anyone with an interest in the subject and do seem to have a genuine desire to make the imperfect process of testing as good as it can be.
We asked CEM for information on the following three topics and feel they still have some way to go with responses on all of these before we’d consider they’re operating in a fully open and transparent manner.
Test results – Standardisation.
When the latest round of test results were published (late October 2014) we were immediately struck by the incredibly wide range of scores, over nine standard deviations! from the mean. (-4.8, +4.33)σ.
We asked CEM for copies of the raw and standardised results so we could understand for ourselves how they had calculated the final standardised scores. CEM declined to provide this information on the grounds that merely releasing the results of the tests themselves would prejudice their commercial interests. An argument we disagree with and are pursuing.
Possibly thinking that we have a narrow personal agenda, CEM did provide the sort of information that parents of those who sat the tests would be probably be interested in – the birth month, gender, cohort and standardised scores for all those who sat the Slough, Reading, Kendrick tests but far from satisfying our curiosity this only exacerbated it. The released information shows that CEM are doing something rather unique with their standardisation calculations and under such circumstances we’re very concerned that they seem to want to keep this a secret.
Standardised score are worked out using the standard formula:
Straight away you can see that with μ and σ (the mean and standard deviation) being fixed for the whole cohort who took the test that there can only be the same number of different standardised scores as there are different raw scores. Normally in 11+ tests each question is given one mark for a correct answer and no marks for a wrong answer. The CEM entrance tests consist of two 50 minute papers containing verbal reasoning, non-verbal reasoning and mathematical questions and we would estimate there are no more than 50 questions in any one category so there should a maximum of about 50 different possible standardised scores (actually 51 as you can also score zero).
The released information for the Kendrick cohort, for example, shows there were 1108 candidates but 1010 different standardised scores. That is only possible if different questions are awarded different amounts. We suspect that they may be weighting each question individually based on it’s difficulty so a question which most candidates correctly answer does not scores as highly as one which only a few get right. Then again they could be disembowelling a monkey and looking at its entrails to decide on the scoring.
We’d really like CEM to provide us the information we requested so we can make our own informed decision about whether it’s a good thing or not. Damaging commercial interests? The information they released is in the pubic domain. It would take a competitor the same three or four minutes it took us to realise that CEM have moved away from the standard one mark per question approach and about the same length of time to speculate how they’re doing this, although probably not the monkey guts thing but the other one.
Accuracy of the tests
We requested the 95% confidence interval for a score of 110 in the Reading/Kendrick tests and 111 in the Slough tests. Confidence intervals are just the sort of thing that statisticians get all excited about because they provide an objective measure of the accuracy of what’s being measured. CEM initially said they don’t have this information – rather remarkable for a organisation which specialises in objective research into testing. We explained we were interested in any information they may have relating to the accuracy of the tests. CEM have replied that if we can specifically request the information they hold then they can provide it but they’ve not told us what information they hold making it impossible to request this information.
We can’t help thinking that CEM do have information relating to the accuracy of the tests but they don’t want to provide it. We’ve told them that we think their reply does not comply with section 16 of the Freedom of Information Act so we’ll have to see what they come back with.
Level of precision
There are non hard and fast rules about what level of precision should be used for any given measurement although there are clearly some cases where the use of precision is inappropriate. For example measuring time in milliseconds … with a sundial … on a ship … in a gale. All other grammar schools measure 11+ test results in whole integers (1/15σ). Assuming this is then taken as a reasonable level of precision there is no rational justification to increase this precision unless accompanied by a proportional increase in accuracy. It was therefore quite a shock when the Reading/Kendrick scores – which are exactly the same accuracy as Slough results – were posted out to one hundred times greater precision. This is all the more concerning in view of an undertaking given by Reading School to Rob Wilson MP that when they moved to CEM tests they would be guided by CEMs recommendations on scoring.
We’d like to know if Reading School went back on a written undertaking they gave to their local member of parliament or if alternatively, CEM told them that they should record the scores to two decimal places. CEM declined to shed any light on this question saying that they don’t hold any information about which organisation suggested/recommended this.