The University of Durham’s Centre for Evaluation and Monitoring (CEM) tests are being used to decide who gets a place at an increasing number of our state funded grammar schools.  Given the increased importance these tests are playing in the conduct of public affairs we were rather disappointed with CEM’s response to our attempts to find out a little more about them.  In a recent discussion we had with Richard Adams from the Guardian he described CEM as “the good guys” and on the whole we’d agree. They produced the report Evidence on the effects of selective educational systems on behalf of the Sutton Education Trust which is an essential read for anyone with an interest in the subject and do seem to have a genuine desire to make the imperfect process of testing as good as it can be.

We asked CEM for information on the following three topics and feel they still have some way to go with responses on all of these before we’d consider they’re operating in a fully open and transparent manner.

Test results – Standardisation.

When the latest round of test results were published (late October 2014) we were immediately struck by the incredibly wide range of scores, over nine standard deviations! from the mean.  (-4.8, +4.33)σ.

We asked CEM for copies of the raw and standardised results so we could understand for ourselves how they had calculated the final standardised scores. CEM declined to provide this information on the grounds that merely releasing the results of the tests themselves would prejudice their commercial interests. An argument we disagree with and are pursuing.

Possibly thinking that we have a narrow personal agenda, CEM did provide the sort of information that parents of those who sat the tests would be probably be interested in – the birth month, gender, cohort and standardised scores for all those who sat the Slough, Reading, Kendrick tests but far from satisfying our curiosity this only exacerbated it. The released information shows that CEM are doing something rather unique with their standardisation calculations and under such circumstances we’re very concerned that they seem to want to keep this a secret.

Standardised score are worked out using the standard formula:

standardisation formulawhere x is the raw score, μ is the mean score and σ is the standard deviation.


Straight away you can see that with μ and σ (the mean and standard deviation) being fixed for the whole cohort who took the test that there can only be the same number of different standardised scores as there are different raw scores.   Normally in 11+ tests each question is given one mark for a correct answer and no marks for a wrong answer. The CEM entrance tests consist of two 50 minute papers containing verbal reasoning, non-verbal reasoning and mathematical questions and we would estimate there are no more than 50 questions in any one category so there should a maximum of about 50 different possible standardised scores (actually 51 as you can also score zero).

The released information for the Kendrick cohort, for example, shows there were 1108 candidates but 1010 different standardised scores. That is only possible if different questions are awarded different amounts. We suspect that they may be weighting each question individually based on it’s difficulty so a question which most candidates correctly answer does not scores as highly as one which only a few get right. Then again they could be disembowelling a monkey and looking at its entrails to decide on the scoring.

We’d really like CEM to provide us the information we requested so we can make our own informed decision about whether it’s a good thing or not. Damaging commercial interests? The information they released is in the pubic domain. It would take a competitor the same three or four minutes it took us to realise that CEM have moved away from the standard one mark per question approach and about the same length of time to speculate how they’re doing this, although probably not the monkey guts thing but the other one.

Accuracy of the tests

We requested the 95% confidence interval for a score of 110 in the Reading/Kendrick tests and 111 in the Slough tests.  Confidence intervals are just the sort of thing that statisticians get all excited about because they provide an objective measure of the accuracy of what’s being measured. CEM initially said they don’t have this information – rather remarkable for a organisation which specialises in objective research into testing. We explained we were interested in any information they may have relating to the accuracy of the tests.  CEM have replied that if we can specifically request the information they hold then they can provide it but they’ve not told us what information they hold making it impossible to request this information.

We can’t help thinking that CEM do have information relating to the accuracy of the tests but they don’t want to provide it.  We’ve told them that we think their reply does not comply with section 16 of the Freedom of Information Act so we’ll have to see what they come back with.

Level of precision

There are non hard and fast rules about what level of precision should be used for any given measurement although there are clearly some cases where the use of precision is inappropriate.  For example measuring time in milliseconds … with a sundial … on a ship … in a gale.  All other grammar schools measure 11+ test results in whole integers (1/15σ). Assuming this is then taken as a reasonable level of precision there is no rational justification to increase this precision unless accompanied by a proportional increase in accuracy. It was therefore quite a shock when the Reading/Kendrick scores – which are exactly the same accuracy as Slough results – were posted out to one hundred times greater precision.  This is all the more concerning in view of an undertaking given by Reading School to Rob Wilson MP that when they moved to CEM tests they would be guided by CEMs recommendations on scoring.

We’d like to know if Reading School went back on a written undertaking they gave to their local member of parliament or if alternatively, CEM told them that they should record the scores to two decimal places.  CEM declined to shed any light on this question saying that they don’t hold any information about which organisation suggested/recommended this.

This entry was posted in Uncategorised. Bookmark the permalink.

2 Responses to CEM

  1. Mark says:

    1. Has the University of Durham got any evidence the 11+ selective grammar school tests do indeed test innate ability and are resistant to preparation.

    Information not held. The University does not claim to test innate ability. The philosophy behind our approach is to design tests with the specific goal of minimising the impact of intensive coaching.

    2. Please provide me with any evidence the University of Durham have that the 11+ tests they produce for selective entry in to grammar schools tests innate ability and are resistant to preparation.

    See 1 above.

    3. State whether any trials were undertaken Eg 2 groups of children were tested, with one group tutored or prepared for one year and one group not prepared and their results compared? Was there any significant different and to what statistically significance.

    Trials have been undertaken as part of the development of each new assessment. The trials do not cover the effects of tutoring.

    4. Were any trials or results or evidence scrutinised by any independent organisation?

    Trial data was reviewed by an independent consultant appointed by a client.

    5. Is there any evidence innate ability is related to social class or Pupil Premium status?

    There is a large body of work investigating the links between socio-economic status and cognitive ability. The University does not believe it is for us to summarise such an extensive field of research.

    6. Does the University believe Pupil Premium Children are at any disadvantage in its 11+ tests (consider the claim that the University claims tests are resistant to preparation, and tests innate ability and tuition is not required).

    CEM is undertaking research in this area. The results of this research are not yet available.

    7. Are the tests designed to be “class less”, i.e. enable children of all socio-economic groups or social class to perform equally well and offer no advantage to any group?

    Tests are designed to be fair and are structured to differentiate on cognitive ability only.

    8. Are the tests gender neutral? Is there any evidence to substantiate they are neutral?

    Yes. Yes.

    9. Is there any explanation why boys seem to score higher marks than girls in certain regions?

    Information not held.

    10. Are the tests racially neutral? Is there any evidence to substantiate they are neutral?

    Tests are designed to be fair and are structured to differentiate on cognitive ability only. Where ethnicity data have been made available, differential item functioning analysis between white British and non-white British candidates indicates no overall bias.

    11. Are any tests standardised on parental income or education of parents of children sitting the tests?


    12. Please provide a list of local authorities, school consortium or individual schools that use 11+ selective tests from CEM Centre for Evaluation and Monitoring.

    See attached.

    13. Did CEM Centre provide its clients with evidence tests are resistant to preparation or tests innate ability?

    Clients were briefed verbally on the Centre’s approach to test design and delivery.

    I hope this information is useful.

    Yours sincerely,

    Deputy Information and Data Protection Manager
    Governance and Executive Support
    Durham University

  2. Gary says:

    Keen to understand the whole process as DD is appearing this year, so went through this site.

    I believe I can understand why the standardised scores are > 50(or 51), is because there is age standardisation also being done. If age standardisation is being done first and its being done to a higher precision (to two decimals), that might explain why you have a higher number of possible standardised scores.

    Not sure but thought this might be relevant…

Leave a Reply

Your email address will not be published. Required fields are marked *