Sunday, 31 August 2014

The great chain of being


Aristotle had the notion that organisms could be placed on a linear natural scale according to complexity of structure and function, such that higher organisms showed greater vitality and ability to move. Evolution has done something similar, building up the ladder of life from very basic micro-organisms and then scaling up to greater integrative complexity. In that sense, living things are concatenations and elaborations of earlier solutions, which have been tweaked through natural selection, refinements being added to the code, often making it a bit longer.

Yesterday a large concatenation of more than 25 authors (did they write a paragraph each?) has published in Nature “Comparative analysis of the transcriptome across distant species” and studied the overlap between humans, worms and flies.

They conclude: Overall, our comparison of the transcriptomes of three phylogenetically distant metazoans highlights fundamental features of transcription conserved across animal phyla. First, there are ancient co-expression modules across organisms, many of which are enriched for developmentally important hourglass genes. These conserved modules have highly coordinated intra-organism expression during the phylotypic stage, but display diversified expression before and after. The expression clustering also aligns developmental stages between worm and fly, revealing shared expression programs between embryogenesis and metamorphosis. Finally, we were able to build a single model that could predict transcription in all three organisms from upstream histone marks using a single set of parameters for both protein-coding genes and non-coding RNAs. Overall, our results underscore the importance of comparing divergent model organisms to human to highlight conserved biological principles (and disentangle them from lineage-specific adaptations).

What has this got to do with our usual subject, intelligence? Well, Heitor Fernandes, Michael A.Woodley and Jan te Nijenhuis have been investigating whether there is a Great Chain of Being Intelligent. Could the same forces that created ancient co-expression modules have done so for the nervous system? Indeed, how could they not have done so? Evolution happens because organisms survive for long enough and in one piece so as to mature and reproduce. They cannot toss forwards only those genes which will conform to refined sensibilities generations later. Do all animals, like humans, have a general intelligence factor accounting for half of their abilities?

As you would expect from these three clever monkeys, the paper is rich with content and deploys some complex statistics. I do not consider myself to be a religious fundamentalist, but this gang even attack principal components analysis. How does one deal with such iconoclasts, who seem bent on replacing the ancient verity with Principal Axis Analysis (see their apostasy below).

Anyway, our three monkeys have respected some traditions in that they evaluate animal intelligence in terms of a Five Factor model of primate IQ: Innovation, tool use, social learning, tactical deception and extractive foraging.

This is what they say:

General intelligence has been shown to exist within and among species of mammals and birds. An important question concerns whether it is the principal source of differences in cognitive abilities between species, as is the case with comparisons involving many human populations. Using meta-analytic databases of ethological observations of cognitive abilities involving 69 primate species, we found that cognitive abilities that load more strongly on a common factor (which is here termed G, in line with the terminology developed in previous literature to describe aggregated measures of general intelligence) are associated with significantly bigger interspecies differences and bigger interspecies variance. Additionally, two novel evolutionary predictions were made: more G-loaded abilities would present (1) weaker phylogenetic signals, indicating less phylogenetic conservativeness, and (2) faster rates of trait evolution, as it was hypothesized that G has been subjected to stronger selection pressures than narrower, more domain-specific abilities. These predictions were corroborated with phylogenetic comparative methods, with stronger effects among catarrhines (apes and Old World monkeys) than within the entire primate order. These data strongly suggest that G is the principal locus of selection in the macroevolution of primate intelligence. Implications for the understanding of population differences in cognitive abilities among human populations and for the theory of massive modularity applied to intelligence are discussed.

We obtained meta-analytical frequency-count data on five cognitive abilities from a total of 69 primate species. Data on four different cognitive abilities were obtained directly from Reader et al. (2011). Their meta-analytic database was produced by examining over 4000 articles published from 1925 to 2000 for reports of behaviors indicative of intelligence (described above) in extant primate species. Data on a fifth cognitive ability were obtained from a meta-analytic compilation produced by Byrne and Whiten (1990), and originally obtained by surveying the large memberships of the International Primatological Society, the Primate Society of Great Britain, the Association for the Study of Animal Behavior, the Animal Behavior Society, and the American Primatological Society combined.

Different species use complex problem-solving behaviors in different ecologies, thus the various senses (e.g., olfaction, hearing) have different weights of importance for different species with regard to how they perceive and identify ecological and social problems to be tackled, the motivation systems, dependence on rewards, and tolerance to frustration vary across species, thus it is extremely difficult to calibrate experimental conditions to the ecological idiosyncrasies of each species.
Additionally, experimental cognitive tests are not available for large numbers of species or on a sufficiently broad range of cognitive abilities. Hence the natural frequency-counts approach used in the collection of the current dataset is the most
appropriate and ecologically valid estimate of intelligence for comparative studies, that is, studies in which macro evolutionary predictions are being tested at the cross-species level (Reader & Laland, 2002; Reader et al., 2011; see also Lefebvre, 2011).

We conducted a Principal Axis Factor analysis (which, contrary to principal components analysis, controls for error variance; Costello & Osborne, 2005) to test the factor structure of the five cognitive abilities. We also tested their factor structure with Unit Weighted Factoring (UWF), which avoids the well-known sample-specificity of factor-scoring coefficients produced by standard errors of inconsistent magnitudes in small samples (Gorsuch, 1983). Both factor analyses were conducted after residualizing each cognitive ability against research effort so as to avoid publication bias.

The findings reported here have substantial implications. Firstly, the species differences in intelligence and their variance from the mean are biggest on the more G-loaded cognitive abilities, as is also the case for population differences within the
human species. This suggests that the evolutionary accounts developed to explain population cognitive differences in humans are plausible, as it is improbable that a “Factor X” (the term generally used to refer to putative environmental
causes of population differences in cognitive abilities; Jensen, 1973) could be operating to create the findings reported here among primate species. Several putative “Factor Xs” involve systematic negative discrimination or stereotype threat
(Sesardic, 2005). It is difficult to envisage how these social forces might extend across primate phylogenies. A more parsimonious account of the apparent ubiquity of validation for Spearman's hypothesis is that it results from more common-factor-loaded abilities simply being more revealing of taxonomic group differences owing to differential selection having operated historically on general intelligence to a greater extent than on narrower and more modular abilities — and that this is likely the same for human populations as it is for primate species, the principal difference being the duration of selection.

Comment: it looks as if a G factor (principal axis based on aggregated measures) can be extracted from primates, even though they cannot be tested with pencil and paper and spoken vocabulary definitions. Primates appear to conform to the same general intellectual factor g (based on individual measures) which underlies human abilities.

Darwin should have the last word (1871, p. 105)

“the difference in mind between man and the higher animals… is certainly one of degree and not of kind" (Italics added for emphasis).

Wednesday, 27 August 2014

Rotherham Child Abuse Scandal


There is widespread public revulsion at the disclosure that an estimated 1,400 white girls have been systematically raped, abused and prostituted by a gang of Pakistani men in Rotherham, Yorkshire. The Pakistani men are described as being “of Pakistani heritage” as if they were in charge of some ancient monument. This convolution arises because their common name is seen as tantamount to a hate crime. The white girls are described as being “vulnerable”. That word, well intentioned, arises because reporters cannot bear to spell out whether the girls are dull and thus exploitable, or the children of disordered families who did not look after them, and thus exploitable. The unspoken words are all demeaning, but beyond “vulnerable”  it would be good to know more about the paths which took them into the hands of their abusers.

The implication is that no-one cared for the girls very much, that many officials considered them responsible for their dissolute lives (even at age 12), and that most Councillors did not exert themselves to protect them. There is also a political under-current in that Rotherham Council apparently felt uneasy about saying that the gang was exclusively Pakistani. According to press reports this Council had 7 Diversity officers, now reduced to 4. I do not know if services are any better in Councils run by other political parties, but I doubt they can be much worse.

The current leader of the Council was on the Today program giving the usual excuses: “I had no idea at the time, we are still studying the report, lessons will be learned, procedures will be examined, and I see no reason to resign”.

Readers of this blog will know that this is all old news. I copy below my main objections: poor methods for estimating prevalence of child abuse, failure to compare numbers of perpetrators with ethnic group numbers in the population, and general evasiveness in reporting.

I offer two excuses for repeating the key points in these posts: 1) I had fewer readers then, so this might be new to you 2) I emailed the Commissioners, asking them if they could send me their technical appendix or any further explanations about their procedures, particularly their failure to deal with ethnic over-representations properly, and never got a reply.

After reading the whole post, you might like to send them a note, asking why they did not compare their perpetrator numbers with the census figures for ethnic groups

Friday, 23 November 2012

Reporting on child sexual abuse

The Children’s Commissioner’s report "I thought I was the only one" on child sexual exploitation mentions perpetrators’ ethnicities, but without saying whether the numbers were more or less as predicted from the Census. In fact, the Commission’s own figures show that only half the predicted number of White perpetrators were actually found (43% versus 88%), twice the number of “mixed” ethnicity (3.8% versus 1.8%), almost 5 times the number of Asians (33% versus 6.7%) and almost 7 times the numbers of Blacks (19% versus 2.8%).

As regards the broader question of the national extent of sexual exploitation, the headline annual figure of 16,500 victims is not case-based, but is inferred from signs of disturbed behaviour, which of course may be due to factors other than sexual exploitation. It is no more than a tentative indication, and it would be unwise to base decisions on methods which lack detail and require peer review.

Thursday,6 December 2012.

My next post was entitled “Icebergs and Onions” and is more general. It explains prevalence estimates options. The link is below:

Tuesday, 14 May 2013

Reporting on child abuse Part 2


Some lads from Oxford have been sent to jail for child sexual abuse. The details of their treatment of vulnerable young girls does not bear repeating. The BBC, the main conduit of news for the respectable classes, makes no mention about a key issue: the background of the abused girls. The pictures of these loathsome characters reveals some features in common, but nothing about the features of the girls, who of course cannot be pictured in this appalling story. They seem to have been, by a significant majority, white girls from poor and disturbed backgrounds, possibly of low intelligence. Mark Easton, in a somewhat more pensive article BBC article attached to the main story,  reflecting on some of the conclusions of a Child Commisioner's Report says "Black and particularly Asian perpetrators remain over-represented".
I have previously blogged about this study, which I found was somewhat evasive about the data they collected, so to save you searching for it I will copy out the relevant paragraph.
the Commission’s own figures show that only half the predicted number of White perpetrators were actually found (43% versus 88%), twice the number of “mixed” ethnicity (3.8% versus 1.8%), almost 5 times the number of Asians (33% versus 6.7%) and almost 7 times the numbers of Blacks (19% versus 2.8%).
Responsible reporting should be accurate, not evasive. 

Wednesday, 15 May 2013

Sex gangs, crime statistics and euphemistic generalities

Some men abuse young girls, selecting those who are vulnerable and in care homes in order to have sex with them, and to make money out of their earnings as prostitutes. They flatter these poor girls with protestations of love, shower them with gifts, then ply them with alcohol and drugs, and treat them with barbaric cruelty, leaving them injured, bewildered and pathologically dependent on them.

On this morning’s Today program, the best current affairs radio show in England, the redoubtable John Humphrys, castigator of political pontificators and assorted slimy toves, did one of his persistent, pressing and thorough interviews with Chief Constable Sara Thornton, who struggled to explain why this particular grooming gang had been able to operate in Oxfordshire since 2006 with impunity.

Retrospect is the investigative journalist’s strongest card. We now know there was a gang, a conspiracy, but it was not known then. A number of disturbed and abused young teenagers reported to the Police what had happened to them, but the Police “did not join up the dots”. They saw instead a series of individual cases of abused, unreliable witnesses, at least one of whom could not bear to repeat out loud in a public court what had been done to her, so the case collapsed. What Sara Thornton did not say, as she descended into the pit of unconvincing explanations (leading to the traditional “Are you going to resign” ending) was that Police work is usually a mass of dots, mostly of personal tragedies and gross mischief that have no connection whatsoever, other than that some humans behave in an inhumane manner, and that there is no end to such barbarity.

Humphrys asked whether some of the abusers could not have been followed perpetually until some hard evidence of abuse was obtained. A reasonable question it would seem. However, to follow a person in such a way requires three teams a day, and a lot of assets, as perpetrators drive large distances by car from one assignation to the next, with an apparently willing abused girl sitting in the car with them. Given that the informal estimate of the number of active Jihadist would-be bombers in the UK is 2000 persons at any time, resources are scarce. Following suspects is easy in films, and very complicated and expensive in real life.

Anyway, the Oxfordshire Police now have a new unit dedicated to catching these sorts of abusers. General comment on the crime has been muted, with much repetition of Deputy Children's Commissioner Sue Berelowitz’s remark that the 'model' of Asian men targeting white girls was just one of 'a number of models'. This is the educated person’s version of “there is good and bad in all races”.

I have two gripes about their report. The first is that they did not use a range of methods to estimate the number of children at risk (see Icebergs and Onions).  It is a technically difficult area, but they did some simple extrapolations, and did not use the better validated capture-recapture method, which in this case would have resulted in a better estimate.  Here is what I said in that post:

“Using some data provided on sexually abused children provided by “The Office of the Children’s Commissioner’s Inquiry into Child Sexual Exploitation in Gangs and Groups”, I have tried to work out, from their figures, the numbers of children who go missing. The report is difficult to follow, and I asked if they had a technical appendix two weeks ago, to no avail so far. Assume their Venn Diagram 1 on page 71 is a vague guide to the missing rates in some local authorities.

Police have netted 5611 names of missing children, Local Authorities 1256, with an overlap of 1508 of children where both agencies agree that the child is missing. How many children are really missing? Using the Lincoln-Petersen method there are 4,673 missing children.”

The second gripe is that they did not properly compare the race of perpetrators with the racial composition of the country so as to get a crime rate per racial group. They have still not replied to my enquiry about their statistics and methods, but are still trotting out the same old line about “different models”. The differences between different ethnic groups are considerable, and should be discussed (see posting “Reporting on child abuse Part 2”).  The whole report is due for a thorough statistical re-analysis.

The gang operating in Oxfordshire were 5 Pakistanis and 2 North Africans. No Sikhs or Indians or Chinese in this particular case. By the way, the accepted phrase used now is “Pakistani heritage”.  One cannot estimate crime rates from a single court case, nor necessarily from several such cases, but the Commissioner’s own statistic would place the “Asian” perpetrator rate at 5 times the expected population value.  Statistics like that, if found in cancer research, would trigger a health warning, and the usual flurry of articles suggesting we all needed to change our diets or lifestyles.

At heart this is disproportionately a problem about policing some minorities within minorities. We need to be able to say that only an infinitesimal segment of those ethnic minorities commit such crimes, whilst also reporting that that very small rate varies significantly from one group to another. Open reporting of ethnicity and other background details should be the norm in a free society.

Tuesday, 26 August 2014

Woodley elaborates


What I said in London was that whilst it is true that correlation does not necessarily equate to causation, all causally related variables will be correlated. Thus correlation is always necessary (but not in and of itself sufficient) for establishing causation.

The claim that 'correlation does not equal causation' is therefore meaningless when used to counter the results of correlative studies in which specific causal inferences are being made, as the inferred pattern of causation necessarily supervenes upon correlation amongst variables. Whether the variables being considered are in actuality causally associated as per the inference is another matter entirely.

The correct critique of such findings therefore is from mediation, i.e. the idea that a given correlation might be spurious owing to the presence of 'hidden' variables that are generating the apparent correlation. A famous example is yam production and national IQ, which across countries correlate negatively. It would be wrong to say that yam production somehow inhibits IQ, as the association will in fact turn out to be mediated by something like temperature and latitude. These variables are in turn proxies for historical and ecological trends that make the sort of countries that yield fewer yams the sort of countries that are typically populated by higher ability people, and vice versa. The causation in this case is via additional variables, which cause the covariance between the two variables of interest, without there being a direct effect of one on the other.

Properly constructed multivariate models can use these patterns of mediation to infer the likelihood of causation going in one direction or another. Thus it is possible to actually test causal inference amongst a population of correlated variables. By far the best way of doing this is to compare the fits of models containing specific theoretically prescribed patterns of causal inference against (preferably many) alternative theoretically plausible models, in which alternative patterns of causation are inferred (Figueredo & Gorsuch, 2007).

Sir William Gemmell Cochran termed this “Fisher’s Dictum‟:

“About 20 years ago, when asked in a meeting what can be done in observational studies to clarify the step from association to causation, Sir Ronald Fisher replied; `Make your theories elaborate.' The reply puzzled me at first, since by Occam's razor, the advice usually given is to make theories as simple as is consistent with known data. What Sir Ronald meant, as subsequent discussion showed, was that when constructing a causal hypothesis one should envisage as many different consequences of its truth as possible, and plan observational studies to discover whether each of these consequences is found to hold. (Cochran, 1965, §5).


Cochran, W. G. (1965). The planning of observational studies of human populations
(with Discussion). Journal of the Royal Statistical Society. Series A, 128, 134–155.
Figueredo, A. J., & Gorsuch, R. L. (2007). Assortative mating in the jewel wasp. 2.
Sequential cononical analysis as an exploratory form of path analysis. Journal of
the Arizona-Nevada Academy of Science, 39, 59-64.

The Woodley Challenge


For some time now I have been getting tired of the “correlation is not causation” mantra. This slogan is true as far as it goes, but it tends to be used so as to argue that, despite many correlations linking A with B being found in different circumstances, these will somehow never suffice to strongly suggest a causal link between A and B. On the contrary, I argue that correlation is a necessary feature of causation, but not a sufficient proof. I want to change the slogan to: Correlation is not always causation, but it helps find causes.

In doing all this I half-remembered a challenge set by Michael Woodley at the London Conference on Intelligence last April, so after getting the wording from him again, I thought I would bring it to a wider audience:

"Sure, correlation does not equal causation, but find me just one single instance of a causal relationship where there is no correlation (just one would suffice)."

As befits a challenge, I will be offering the traditional bottle of wine to the best instance. Woodley judges, I arbitrate if required, and provide the bottle of wine.

Monday, 25 August 2014

Depraved on account of being deprived?


In West Side Story Stephen Sondheim set out the theories of juvenile delinquency with more clarity, and certainly more brevity, than the academics who had dreamed them up. A prominent theory in sociological circles is that crime arises from poverty and consequently that the alleviation of poverty by paying social benefits should diminish criminality.

The link between poverty and crime has been demonstrated repeatedly, and recently confirmed for USA and Norway. Repetition of a correlation impacts academic and public opinion. However, as we are wearily cognizant of, correlation is not causation, though in ordinary life it damn well implies it. Correlation is a necessary feature of causation, but not a sufficient proof. The quip should be altered to: correlation is not always causation, but it helps.

This link has been investigated, in a different way, by a gang of sociologists led by Amir Sariaslan (ex-Uppsala) and his colleagues at the great Karolinska in Sweden, the country of Volvo, Saab (RIP), Bofors guns, Primus stoves, interminable Bergman movies, winter candles on the streets of gamla gatan, pacificism, social welfare, and obsessional scandinavian epidemiology. The latter has proved a redeeming feature.

Childhood family income, adolescent violent criminality and substance misuse:
quasi-experimental total population study. Amir Sariaslan, Henrik Larsson, Brian D’Onofrio, Niklas Langstrom and Paul Lichtenstein. British Journal of Psychiatry.

Published online ahead of print August 21, 2014, doi:10.1192/bjp.bp.113.136200

Children of parents in the lowest income quintile experienced a seven-fold increased hazard rate (HR) of being convicted of violent criminality compared with peers in the highest quintile (HR = 6.78, 95% CI 6.23–7.38). This association was entirely accounted for by unobserved familial risk factors (HR = 0.95, 95% CI 0.44–2.03). Similar pattern of effects was found for substance misuse.

The authors point out:

Behavioural genetic investigations have found that the liabilities for both violent offending and substance misuse are substantially influenced by shared genetic and, to a lesser extent, family environmental factors.7,8

7 Frisell T, Lichtenstein P, Langstrom N. Violent crime runs in families: a total
population study of 12.5 million individuals. Psychol Med 2011; 41: 97–105.

8 Kendler KS, Sundquist K, Ohlsson H, Palme r K, Maes H, Winkleby MA, et al.
Genetic and familial environmental influences on the risk for drug abuse:
a national Swedish adoption study. Arch Gen Psychiatry 2012; 69: 690–7.

We linked data from nine Swedish, longitudinal, total-population registers maintained by governmental agencies. The linkage was possible through the unique 10-digit civic registration number assigned to all Swedish citizens at birth and to immigrants upon arrival to the country.

The final sample (omitting multiple-births, death, severe handicap and emigrants) consisted of 88.6% of the targeted population (n = 526 167). The
sample included 262 267 cousins and 216 424 siblings nested within 114 671 extended and 105 470 nuclear families.

We calculated mean disposable family income (net sum of wage earnings, welfare and retirement benefits, etc.) of both biological parents for each offspring and year between 1990 and 2008. Income measures were inflation-adjusted to 1990 values according to the consumer price index provided by Statistics Sweden.

Gender, birth year and birth order were included in all models. We also adjusted for highest parental education and parental ages at the time of the first-born child, and parental history of ever being admitted to hospital for a mental disorder.

Violent crime was defined as a conviction for homicide, assault, robbery, threats and violence against an officer, gross violation of a person’s/woman’s integrity, unlawful threats, unlawful coercion, kidnapping, illegal confinement, arson, intimidation,
or sexual offences (rape, indecent assault, indecent exposure or child molestation, but excluding prostitution, hiring of prostitutes or possession of child pornography).

The participants entered the study at their fifteenth birthday and were subsequently followed up for a median time of 3.5 years. The maximum follow-up time was 6 years.

This is a short time to pick up the full flowering of criminal careers, so perhaps should be considered and under-estimate, or purely a measure of juvenile delinquency and not of life time criminality (which usually lasts until middle age).

Readers will know that I cast a particularly baleful eye over all “corrections” and “adjustments” but in this paper the techniques are transparent, and have an intrinsic justification. The data allows them to compare siblings with cousins, and intact nuclear families with more scattered ones: two natural experiments which allow contrasts of shared genes and experience. Crafty. That is my summary, but here is their explanation in detail:

We fitted two separate models for the entire sample (n = 526 167) that gradually adjusted for observed confounding variables. Model I adjusted for gender, birth year and birth order, whereas Model II also adjusted for highest parental education, parental ages at the time of the first-born child and parental history of admission to hospital for a mental disorder.

To assess the effects also of unobserved genetic and environmental factors, we fitted stratified Cox regression models to cousin (n = 262 267) and sibling (n = 216 424) samples with extended or nuclear family as stratum, respectively. The stratified
models allow for the estimation of heterogeneous baseline hazard rates across families and thus capture unobserved familial factors. This also implies that exposure comparisons are made within families. Model III was fitted to the cousin sample and adjusted for observed confounders and unobserved within extended-family factors. Model IV was fitted on the sibling sample and accounted for unobserved nuclear family factors and for gender, birth year and birth order.
Cousin and sibling correlations on the exposure variable were calculated based on a varying-intercepts, mixed-effects model where the intercepts are allowed to vary across families.

The magnitude of the variation was expressed as an intra-class correlation (ICC). The ICC measures the degree to which observations are similar to one another within clusters; in this case cousins and siblings nested within extended and nuclear family clusters. The measure ranges between 0 and 1, where the latter implies that cousins and siblings have identical exposure values within families.




As you can see, each model picks away at what would otherwise be seen as a purely economic cause of criminality and drug abuse. Model II which adjusts for parental education and mental illness has a big effect.

In an unusual departure, The Economist devoted an article to this paper, which suggests that they are beginning to wake up to the human factors in economics. Admittedly, they sub-titled it  A disturbing study of the link between incomes and criminal behaviour, suggesting they were disturbed. Here are The Economist’s conclusions:

That suggests two, not mutually exclusive, possibilities. One is that a family’s culture, once established, is “sticky”—that you can, to put it crudely, take the kid out of the neighbourhood, but not the neighbourhood out of the kid. Given, for example, children’s propensity to emulate elder siblings whom they admire, that sounds perfectly plausible. The other possibility is that genes which predispose to criminal behaviour (several studies suggest such genes exist) are more common at the bottom of society than at the top, perhaps because the lack of impulse-control they engender also tends to reduce someone’s earning capacity.

Neither of these conclusions is likely to be welcome to social reformers. The first suggests that merely topping up people’s incomes, though it may well be a good idea for other reasons, will not by itself address questions of bad behaviour. The second raises the possibility that the problem of intergenerational poverty may be self-reinforcing, particularly in rich countries like Sweden where the winnowing effects of education and the need for high levels of skill in many jobs will favour those who can control their behaviour, and not those who rely on too many chemical crutches to get them through the day. 

This is only one study, of course. Such conclusions will need to be tested by others. But if they are confirmed, the fact that they are uncomfortable will be no excuse for ignoring them.

What The Economist might have said is: Since this is a total population study of five birth cohorts and is the largest by far in the literature, it has high credibility, and the result will stand until another study of equal quality finds otherwise.

Sunday, 24 August 2014

Does reading make kids more intelligent?


In my view, the most that can be said for reading is that reduces the possibility that you might go outside and break a leg playing sports. Certainly that is a valuable contribution to civilization, but some researchers have made the further claim that reading boosts intelligence. Will the intelligence boosting dream never die? Take one large bright pill, follow a convoluted set of mental exercises and then, wham, arise to the soaring heights of genius, (and eventually fall from grace) as in that heart-rending classic, Flowers for Algernon.

The latest paper on this matter is far more crafty than that.

Stuart J. Ritchie, Timothy C. Bates and Robert Plomin (2014) Does Learning to Read Improve Intelligence? A Longitudinal Multivariate Analysis in Identical Twins From Age 7 to 16. Child Development. 24 JUL 2014 DOI: 10.1111/cdev.12272


The authors eschew get-bright-quick schemes and concentrate on whether humble reading ability precedes later increases in intelligence in identical twins. Their core argument is that if one identical twin reads better than their identical twin brought up in the same household, then the difference is unlikely to be their identical genetics, so more likely to be a specific (not shared) environmental effect of some sort. Whatever it is, if they can show it boosts intelligence, then this would be a proven example of the environmental influence of reading ability (due to teaching techniques perhaps) on intelligence, and a possible pathway to boosting intelligence through education.

Here is their plan: Using a longitudinal monozygotic (MZ) twin differences design, we test whether twins who—for purely environmental reasons—acquire better reading skills than their co-twin show improvements in intelligence, and whether these associations are found across five waves of testing. Such a finding would have implications for educational interventions, and may also provide a partial answer to the important question of why children within a family have very different intelligence test scores, despite sharing factors such as genes, parental education, parental personality, and socioeconomic status (Plomin, 2011; Plomin & Daniels, 1987).

I hope that, so far, all this makes sense, but since the authors are clever monkeys the plot gets thicker, and there are many paths in infinite space to be tracked down before clarity ensues. Here is Fig 3 showing the significant associations:





The finding is that Reading Ability Difference at age 7 is related to IQ Difference at ages 9 and 10, and that Reading Ability Difference at age 10 is related to IQ Difference at age 12 and Reading Ability Difference at age 12 is related to IQ difference at age 16. It certainly looks as if the twin with the better reading goes on to get better intelligence test results a few years later.

Table 1 shows the raw results for the particular tests used. The sample sizes, means and standard deviations for the IQ subtests are shown. These tests were administered individually by telephone, using a booklet mailed to the twins’ home prior to testing
(Petrill, Rempell, Dale, Oliver, & Plomin, 2002). Stuart Ritchie has kindly sent me the distributions for the 7 year olds, and it looks like they did less well on Similarities than the other tests, but this must be seen in the context of their being raw scores, not standardised scores:

All the scores in the descriptive table are raw scores, but when they go into the model, they are controlled for age and sex and standardized (the model has no mean structure, just correlations). The tests used in TEDS are not the exact ones from Wechsler - they’re often based on those tests, but with some extra items added to make them harder for the older age groups. I’m not sure on what basis this was done, but obviously the decision was made at some point that they had to keep scaling up the difficulty rather than using the standard, normed tests.

The problem with this, as hinted above, is that you can’t use any kind of latent growth curve approach (like we do regularly in the Lothian Birth Cohort data) - because the tests aren’t the same at each time point, you can’t compare means. It’s harder to do this with kids of course, who are getting better with age rather than worse. Elliot Tucker-Drob, in his Texas Twin sample, always makes sure that there’s one identical test that overlaps two of the waves, so that there’s something to anchor the growth curve. I’m not actually certain how this works in practice, but in the next few years I suppose his Texas Twins papers will be appearing and we’ll find out...

The authors conclude: The present study provided compelling evidence that improvements in reading ability, themselves caused purely by the non-shared environment, may result in improvements in both verbal and nonverbal cognitive ability, and may thus be a factor increasing cognitive diversity within families (Plomin, 2011). These associations are present at least as early as age 7, and are not—to the extent we were able to test this possibility—driven by differences in reading exposure. Since reading is a potentially remediable ability, these findings have implications for reading instruction: Early remediation of reading problems might not only aid in the growth of literacy, but may also improve more general cognitive abilities that are of critical importance across the life span.

Of course, at the beginning of this essay I may have been a little too harsh in my bleak evaluation of the benefits of reading. This was probably because I did not think that reading increased the power of the intellect. I felt that the size of the cognitive engine remained the same. However, education may act as a gearbox, applying a set of skills to maximise the use of the engine.

In his great poem of AD 835 “A mad poem addressed to my nephews and neices”   Po Chu-I begins with a couplet on this very matter:

The World cheats those who cannot read; I, happily, have mastered script and pen

So, it may be that lack of reading cheats children of (some) intelligence.

Wednesday, 20 August 2014

Comparisons are onerous N=1,000,000


Can you remember back in ancient history when school exam questions said: “Compare and contrast”? I found this philosophically interesting, in that I was tempted to compare and contrast the epistemological foundations of comparing and contrasting. More to point, can you remember back to your undergraduate days when you learnt that each contrast and comparison used up some of your luck? I have put this in a dramatic and personal form to capture the dismay I felt when I understood that at least one of the positive t test results I had so painfully calculated was probably a fluke. I decided it was always the twentieth one which had led me astray, the early ones having first mover advantage in capturing the explanatory narrative, and becoming cherished for ever after, the first-born causes.

The problems of multiple contrasts arise in any even mildly complicated data set. Consider a test with 100 items in which you choose to compare each item with each other item in a t test. Doing multiple comparisons will throw up many spurious results, and you won’t know which is false positive and which is true.

Now consider a test with 1000 items. Multiple comparison will create a large number of errors of identification. There are ways of correcting for these multiple comparisons and contrasts, but they are always something of a patch and fix. The better strategy is to increase sample size.

The genome has a very large number of “scores” of interest, some more obvious to identify and measure than others. Deciding what is score and what is junk is not a trivial matter. Finding false positives is easy, finding true positives which replicate much harder. James Lee from the University of Minnesota told me in 2009 that his preliminary estimate of the sample sizes suggested that 100,000 was a likely starting point for dependable results, but that it could be higher. A few years is a long time in genomic analysis but now Steve Hsu has been thinking about this, and has published his conclusions, naming James Lee as one of the researchers whose work has influenced him.

I describe some unpublished results concerning the genetic architecture of height and cognitive ability, which suggest that roughly 10k moderately rare causal variants of mostly negative effect are responsible for normal population variation. Using results from Compressed Sensing (L1-penalized regression), I estimate the statistical power required to characterize both linear and nonlinear models for quantitative traits. The main unknown parameter s (sparsity) is the number of loci which account for the bulk of the genetic variation. The required sample size is of order 100s, or roughly a million in the case of cognitive ability.

The paper is attractive for covering the background to the genetics of intelligence in a clear and succinct format. Steve Hsu talks about the reduced cost of sequencing the genome, which is speeding up research; the heritability of intelligence; the Flynn effect; exceptional intelligence; and additive genetic models.

One might say that to first approximation, Biology = linear combinations of nonlinear gadgets, and most of the variation between individuals is in the (linear) way gadgets are combined, rather than in the realization of different gadgets in different individuals.

I like the word gadgets. That is the sort of genetics I understand. Alleles be damned.

Pairs of individuals who were both below average in stature or cognitive ability tended to have more SNP changes between them than pairs who were both above average. This result supports the assumption that the minor allele (–) tends to reduce
the phenotype value. In a toy model with, e.g., p = 0:1;N = 10k, an individual with average phenotype would have 9k (+) variants and 1k (–) variants. A below average (-3 SD) person might instead have 1100 (–) variants, and an above average individual (+3 SD) 900 (–) variants. The typical SNP distance between genotypes with 1100 (–) variants is larger than that for genotypes with 900 (–) variants, as there are many places to place the (–) alleles in a list of 10k total causal variants. Two randomly chosen individuals will generally not overlap much in the positions of their (–) variants, so each additional (–) variant tends to increase the distance between them.

The content of the basic calculation as to how much any species can be improved underlies the work of animal and plant breeders. As leading population geneticist James Crow of Wisconsin wrote [14]:

The most extensive selection experiment, at least the one that has continued for the longest time, is the selection for oil and protein content in maize (Dudley 2007). These experiments began near the end of the nineteenth century and still continue; there are now more than 100 generations of selection. Remarkably, selection for high oil content and similarly, but less strikingly, selection for high protein, continue to make progress. There seems to be no diminishing of selectable variance in the population. The effect of selection is enormous: the difference in oil content between the high and low selected strains is some 32 times the original standard deviation.

Hsu’s point is to show that as regards intelligence, humans have not reached their upper limit.

His section on compressed sensing is interesting, but I cannot judge it, so leave that to you, dear reader. However, Hsu is clear that a sample size of a million persons will be required. On the upside, that should lead to genetic predictions of IQ accurate to about 8 IQ points. It would also lead to parents being able to choose the brightest of their fertilized eggs. Interesting times.

From the purely scientific perspective, the elucidation of the genetic architecture of intelligence is a first step towards unlocking the secrets of the brain and, indeed, of what makes humans unique among all life on earth.

Monday, 18 August 2014

Paper for private study

Satoshi Kanazawa has given me a link (above) to his intelligence and stability of happiness paper for those of you who might like to look at it, for private study.

Higher education in Finland


Professor Jari Litmanen of the University of Jään Luola has written in from the frozen wastes of Finland, where I can recall spending a good holiday many years ago, to give an account of higher education in Finland.


1 What do you think of the quality of education in your university and in your country?

In Finland, there is a high average (because there is a high average IQ) but it is very conformist. It does not encourage original thinking, but simply learning what other people have said. For example, it makes no difference in your career prospects at all if you pass or are outstanding. So, the system does not encourage brilliance. 

2 Which circumstances encourage or prevent your university from educating students to a high level?

There is little incentive to do more than pass, unless you want to do a PhD. In these circumstances, there is no incentive to get more than a 'good' mark in your Masters. Students can retake exams as many times as they like and many academics publish in Finnish or in English journals in Scandinavia. These tend to have low standards and impact. In terms of encouragement, social status is based around education and you're nobody without a Masters, so there's a strong incentive to get one. 

3 How many of your students are able to follow “College Format”, which means that although they attend lectures they can also learn based on gathering and inferring their own information, and establishing and applying general principles rather than following checklists?

My experience is that they do this to a lesser extent than, for example, students at Oxford or Durham. Perhaps in every class of 30 there might be one or two students. I've taught in a number of departments and I found there were more of these, maybe 4/30, in English Dept. than Anthropology, probably because the English Dept. is more selective. My research indicates that the IQ range in Finland is the narrowest in Europe, by the way.  

4 Does your university recognise that students have different levels of ability, and factor that into exam results and student opinions about the teaching they receive?


5 Are you allowed to set demanding examinations, even if many students fail your test and some are asked to leave the university?


6 Are you allowed to give extra attention to your brightest students, including additional seminars and research work?

On your own dime, yes. 

7 Does your university recognise that university staff have different levels of ability?


8 Do you feel able to teach about group differences in ability without negative consequences to your career?

Yes, to Finns. Finns dislike foreigners, in the main. I'm not sure how these would go down to a group of US international students. 

9 Are there other aspects of university standards which are relevant to the overall quality of the education provided to students?

One good thing in Finland is that it is quite difficult to get into university. Only about 1 in 3 that apply get in anywhere. So, I think the intellectual range is narrower than in the UK, even if I compare this university to Ancient University in England. There were some students at Ancient University, studying things like Education or Sociology, who were very stupid. I've never come across anything like it at Jään Luola among the university students. 

Sunday, 17 August 2014

The intelligent pursuit of happiness

Happiness is what many people say they want, and it certainly ranked high in the minds of the authors of the American constitution, which may be a recommendation, or a warning. Centuries later psychologists have joined in the pursuit, swimming into the waters formerly infested by philosophers to make their helpful suggestions: count your blessings, set your expectations low, love your neighbour unless they are married to someone else, take each day as it comes, live for the moment, and never let a fatuous banality remain unrepeated.

As you may detect, from time to time I have tried to take a positive view of life, but felt too gloomy to carry out all the uplifting exercises with the required conscientiousness. Perhaps, other than having a tragic sentiment towards life as Miguel de Unamuno so aptly decribed it, I was aware from Lykken’s work that happiness levels have a homeostatic quality, and tend to oscillate around a personal mean in the long term, the absolute level of which has a genetic component.

It was with gloomy interest that I came across a paper which has tracked happiness estimates long term, and linked them with other personal characteristics such as personality and intelligence.

In “Why is intelligence associated with stability of happiness” British Journal of Psychology (2014) 105, 316-337 Satoshi Kanazawa looked at life course variability in happiness in the National Child Development Study over 18 years.

In the National Child Development Study, life-course variability in happiness over 18 years was significantly negatively associated with its mean level (happier individuals were more stable in their happiness, and it was not due to the ceiling effect), as well as childhood general intelligence and all Big Five personality factors (except for Agreeableness). In a multiple regression analysis, childhood general intelligence was the strongest predictor of life-course variability in life satisfaction, stronger than all Big Five personality factors, including Emotional stability. More intelligent individuals were significantly more stable in their happiness, and it was not entirely because: (1) they were more educated and wealthier (even though they were); (2) they were healthier (even though they were); (3) they were more stable in their marital status (even though they were); (4) they were happier (even though they were); (5) they were better able to assess their own happiness accurately (even though they were); or (6) they were better able to recall their previous responses more accurately or they were more honest in their survey responses (even though they were both). While I could exclude all of these alternative explanations, it ultimately remained unclear why more intelligent individuals were more stable in their happiness.

Kanazawa reviews the literature, and sets out some expectations: Childhood general intelligence is significantly positively associated with education and earnings; more intelligent individuals on average achieve greater education and earn more money. Intelligence [low] also predicts negative life events, such as accidents, injuries, and unemployment. If more intelligent individuals exercise greater control over their life circumstances, because their resources protect them from unexpected external shocks in their environment, then we would expect more intelligent, more educated and wealthier individuals to experience less variability in their subjective well-being over time. Studies in positive psychology generally show that individuals return to their baseline ‘happiness set point’ after major life events, both positive and negative. So, if less intelligent, and thus less educated and wealthy, individuals experience more negative life events, which temporarily lower their subjective well-being before they return to their baseline ‘happiness set points’, then they are expected to have greater life-course variability in happiness.

Intelligence is associated with health and longevity, and more intelligent children on average tend to live longer and healthier lives than less intelligent children, although it is not known why. Health is significantly associated with psychological well-being. So, it is possible that more intelligent individuals are more stable in their happiness over time because they are more likely to remain constantly healthy than less intelligent individuals.

The National Child Development Study (NCDS) is a large-scale prospectively longitudinal study, which has followed British respondents since birth for more than half a century. Look on this work, ye mighty, and weep. If you want a monument to these island people, look no further. For no other purpose than wanting to know how to give children good lives, all babies (n = 17,419) born in Great Britain (England, Wales, and Scotland) during 03–09 March 1958 were tested, re-interviewed in 1965 (n = 15,496), in 1969 (n = 18,285), in 1974 (n = 14,469), in 1981 (n = 12,537), in 1991 (n = 11,469), in 1999–2000 (n = 11,419), in 2004–2005 (n = 9,534), and in 2008–2009 by which time they were age 50–51 (n = 9,790). If you want this level of intellectual curiosity and altruistic concern for others, avoid caliphates.

The NCDS has one of the strongest measures of childhood general intelligence of all large-scale surveys. The respondents took multiple intelligence tests at Ages 7, 11, and 16. At 7, they took four cognitive tests (Copying Designs, Draw-a-Man, Southgate Group Reading, and Problem Arithmetic). At 11, they took five cognitive tests (Verbal General Ability, Nonverbal General Ability, Reading Comprehension, Mathematical, and Copying Designs). At 16, they took two cognitive tests (Reading Comprehension and Mathematical Comprehension).

Kanazawa did a factor analysis at each age to compute their general intelligence. All cognitive test scores at each age loaded only on one latent factor, with reasonably high factor loadings (Age 7: Copying Designs = .67, Draw-a-Man = .70, Southgate Group Reading = .78, and Problem Arithmetic = .76; Age 11: Verbal General Ability = .92, Nonverbal General Ability = .89, Reading Comprehension = .86, Mathematical = .90, and Copying Designs = .49; Age 16: Reading Comprehension = .91, and Mathematics Comprehension = .91). The latent general intelligence scores at each age were converted into the standard IQ metric, with a mean of 100 and a standard deviation of 15. Then, he performed a second-order factor analysis with the IQ scores at three different ages to compute the overall childhood general intelligence score. The three IQ scores loaded only on one latent factor with very high factor loadings (Age 7 = .87; Age 11 = .95; Age 16 = .92). He used the childhood general intelligence score in the standard IQ metric as the main independent variable in his analyses of the life-course variability in subjective well-being.

Incidentally, it is a general rule that all cognitive tests load on a common factor. They do not have to do so. It’s just the way the results come out. The Big Five Personality Factors were only measured at age 51. Psychologists hadn’t got themselves sufficiently together on the factor analysis of personality 50 years ago when the surveys started. Anyway, personality doesn’t change all that much over the life course.

Questionnaire reports about life satisfaction can be unreliable, but the long term survey has an internal check: respondent had been asked how satisfied with life they expected to be in 10 years time, and that estimate could be compared with their actual reports a decade later. Kanazawa found that more intelligent individuals appeared to be slightly better able to predict their future level of happiness than less intelligent individuals. He used the prediction inaccuracies at ages 33 and 42 as proxy measures of the respondent's ability to assess their own current level of subjective well-being accurately. Interestingly, more intelligent NCDS respondents were simultaneously more accurate in their recall and more honest in their responses, assessed by looking at another question about how tall they were, their accuracy and honesty calculated when their heights were actually measured in a later sweep of the survey.

Now to the results. The first point to make, cautiously, is that since this is an excellent, totally representative, large population sample, even small effects will be detected. The figure below shows the effect of intelligence in reducing happiness variability, and that is dramatic enough. The two extreme categories of childhood general intelligence – those with IQs below 75 and those with IQs above 125 – were separated by nearly one full standard deviation in the life-course variability in life satisfaction. However, many of the overall correlations are relatively small.




Although childhood intelligence is the best predictor of happiness, Kanazawa says he does not know why. This is true in terms of the data set, and represents the restraint expected of a researcher. However, as  a mere commentator I am allowed to speculate. Given the angry criticism some people have shown Nicholas Wade for speculating about the role of genetics in the development of different societies, this may seem a very hazardous enterprise. Nonetheless, here is my speculation. Intelligence is a resource, and intelligent people know it. They may feel they will be able to overcome problems, or at the very least work round them because of their higher level of ability. This gives them the equivalent of money in the bank, available to deal with a rainy day. So, every reverse can be seen for what it is: a nuisance, not a tragedy. The less able have less in the bank. They cannot dampen down the oscillations in mood brought about by adversity. They meet the big waves in a smaller boat, and have a rougher passage.

Can this speculation be tested? It would predict that all life reverses would be overcome more quickly by intelligent persons, with the possible exception of losing a intellectually demanding job, which would damage their sense of intellectual capital. It should predict a lower rate of suicide, which is against the current findings. I may need to work on this speculation a little further.

Note: Marty Seligman was not harmed in the writing of this post.

Wednesday, 13 August 2014

The Zeitgeist of intelligence


At one time intelligence and intelligence testing were seen as agents of social advancement, bringing opportunity to working class children who had been denied their rights in the strictly rationed educational system. Then the zeitgeist changed, and IQ became an instrument of the devil, a cruel trick played on innocents, condemning them to a lifetime of labelling and incarceration in dreadful jobs.

An intelligence test is a “school far” test, and school exams are a “school near” test. School exams are allowed to test what has been taught in a particular school or particular national syllabus. If you don’t know the material you will not do well on the test. (We leave aside the reality that you may get quite a few marks for sitting down and writing some well-meaning banalities). “School near” tests ought to improve with good teaching, good textbooks, and plenty of practice.

A “school far” test avoids the specific knowledge of what might be taught at one school and may have been left untaught at other schools. Instead it seeks to distil out the basics of problem solving which would be required to deal with generic problems found in any school system. These “school far” tests include aspects of very general knowledge, some tests of vocabulary and comprehension of general social rules and practices, for which reason recent arrivals need time to learn about the habits of the host culture before these particular measures can be taken. Five years is a rule of thumb. Most of the “school far” tests comprise very general reasoning, sequences, path finding, pattern matching and simple processing. It looks at pretty basic processes, though it tries to use relatively novel surface forms so that schooling will have very little effect on the results.

Given that school far tests are good predictors of school near tests, and of occupational achievements, of life styles and of health and longevity, why is the spirit of the times so against such a finding?

One reason seems to be a misunderstanding about scores. Of themselves, they do not determine outcomes. Even the best indicators have an error term. IQ is the best predictor, but it achieves that accolade because is the best of a weak bunch of predictors. Predictors are not determiners. A further misunderstanding is that an intelligence score total is the complete description of a person’s ability. Even with the current Wechsler four factorial indexes to give a fuller picture, there is much left out which further and different ability tests can elucidate. Even so, there seems to be an underlying real problem: the score carries an implication that some intellectual feats will probably not be attained. Correct. There is no way round that, though learning about it could be very useful in later career planning.

Then on to an even harder question. Given that there is good data showing that intelligence is heritable, why is the spirit of the times so against such a finding?

Here I think a key misunderstanding is that heritability equals “incapable of being altered in any way”.  Favourable environments will lead to greater achievements, though not, in reality, to endlessly greater achievements. Favourable genetics confer many advantages, but in all cases some effort will be required, often a great deal of effort. Nonetheless, phenylketonuria apart, it is often very hard to have an environmental impact on the outcome of inherited characteristics, given a uniformly reasonable basic conditions. The much desired “level playing field” reveals the very differing skills of the individual players.

Even an interest in genetics seems to require an explanation, a justification of motives, a ritual of purification in which the miscreant promises the audience that genetic questions are only one of their many interests, and certainly not their main subject of enquiry.

All this does not sit well with the enlightenment, and with Trevelyan’s observation: Disinterested intellectual curiosity is the life blood of real civilization.


Tuesday, 12 August 2014

A distant mirror of Ebola

The current outbreak of Ebola reminds me of Barbara Tuchman’s account of the 14th Century great pestilence in “A Distant Mirror” (MacMillan 1979). Like Ebola it was highly lethal, untreatable and transmitted by bodily fluids. Although it was not realized at the time, bubonic plague was initially transmitted by rats and fleas, and later it infected the lungs of human victims and spread even faster by respiratory infection, and so became almost impossible to contain in centres of population: an aerosol turbo-charged Ebola, in fact.

Here is Tuchman’s description (page 108) of how some Medieval European towns responded to the plague, which had death rates of approaching 70% in urban centres, and an overall death rate estimated at about 30% in Europe as a whole.

Stern measures of quarantine were ordered by many cities. As soon as Pisa and Lucca were afflicted, their neighbour Pistoia forbade any of its citizens who might be visiting or doing business in the stricken cities to return home, and likewise forbade the importation of wool and linen. The Doge and Council of Venice ordered the burial on the islands to a depth of at least five feet and organised a barge service to transport the corpses. Poland established a quarantine at its frontiers which succeeded in giving it relative immunity. Draconian measures were adopted by the despot of Milan, Archbishop Giovanni Visconti, head of the most uninhibited ruling family of the 14th century. He ordered that the first three houses in which the plague was discovered were to be walled up with their inhabitants inside, enclosing the well, the sick, and the dead in a common tomb. Whether or not owing to his promptitude, Milan escaped lightly in the roll of the dead.

Frankly, given that in 1347 Europeans had no idea how any disease was transmitted, let alone bubonic plague, prompt disposal of corpses and quarantine worked pretty well. Some Medieval Europeans worked out the basics of the pestilence from sharp observation, and then implemented the necessary preventive steps without hesitation or deviation. The towns that did so survived better.

We need to be careful in our comparisons: Europe as a whole suffered greatly, and could have done much better if the vectors of the plague had been understood. From our perspective, they lacked enlightenment. What is most significant is that even in relative ignorance and in the midst of “the end of the world” when all were dying around them, some Medieval Europeans were able to organise themselves to outwit a profound threat they had never encountered before. Clever move. It was not their expert culture, since officialdom had concluded that the cause lay in an unfavourable conjunction of planets. No help there. Rather, it was about making reasonable real world inferences, having a clear plan and then putting it into action.

Surely anyone ought to be able to do that, faced with a less transmissible plague and having far greater knowledge in 2014 ?

Sunday, 10 August 2014

Ebola and the morality of governments


Medecins sans Frontieres are working in Liberia, and have given an interview today to the BBC saying that Liberia’s official figures were "under-representing the reality", and that the health system was "falling apart".

The MSF co-ordinator for Liberia, Lindis Hurum, told the BBC: "Our capacity is stretched beyond anything that we ever done before in regards to ebola response." She said five of the biggest hospitals in the capital Monrovia had closed for more than a week. "Some of them have now started to re-open but there are other hospitals in other counties that are just abandoned by the staff. We are definitely seeing the whole health care system that is falling apart."

The BBC story picture shows a sidewalk notice board showing Ebola related news, including a scoreboard with “Ebola 7 Govt 1” which shows that at least one chalkboard blogger is maintaining the dry humour of an independent mind. Less reassuring is the news that a Govt Minister has had to explain that the local nut-based cola, Bitter Kola, is not a cure for the disease.

Liberia was colonized by African Americans in 1820. The Wikipedia entry estimated life expectancy to be 57.4 years in 2012, a fertility rate of 5.9 births per woman, maternal mortality at 990 per 100,000 births in 2010. Communicable diseases are widespread, including tuberculosis, diarrheal diseases and malaria. Liberia imports 90% of its rice, a staple food, and is extremely vulnerable to food shortages. In 2007, 20.4% of children under the age of 5 were malnourished. In 2008, only 17% of the population had access to adequate sanitation facilities.

To put it mildly, Liberia is not a poster child of governance. This leads to a dilemma: if a country cannot protect its citizens from a disease which requires soap, disinfectant and body bags, do we send in extra support for international health agencies to do the job that Liberians cannot do, and then do the same across West Africa, or do we send them instructions and exhortations and hope for the best?

Friday, 8 August 2014

Ebola unsolved: WHO spells out the basics

The first time I took part in a World Health Organisation Working Party I was taken downstairs after the morning meeting for lunch in the luxurious canteen, where the food was much better and the view more alluring than in the medical school staff dining room back in London.

As we went back to start the afternoon session my host pointed pointed to a very large, two storey high convoluted and paint be-splattered painting by Jackson Pollock (or by a close cousin of that esteemed artist) which adorned the central lobby. I looked at it with little relish, at a loss as to what to say. “By common consent” my host remarked “this is the clearest depiction of the WHO organisational structure”.

The World Health Organisation, for only the third time in recent years, has issued a Public Health Emergency of International Concern (PHEIC) warning. I have my doubts about any committee process, but I have picked out the main points from the statement. The extraordinary thing is that it has been written almost as if it were a training manual for aspiring public health workers.

The current EVD outbreak began in Guinea in December 2013. This outbreak now involves transmission in Guinea, Liberia, Nigeria, and Sierra Leone. As of 4 August 2014, countries have reported 1 711 cases (1 070 confirmed, 436 probable, 205 suspect), including 932 deaths. This is currently the largest EVD outbreak ever recorded. In response to the outbreak, a number of unaffected countries have made a range of travel related advice or recommendations.

Several challenges were noted for the affected countries:

  • their health systems are fragile with significant deficits in human, financial and material resources, resulting in compromised ability to mount an adequate Ebola outbreak control response;
  • inexperience in dealing with Ebola outbreaks; misperceptions of the disease, including how the disease is transmitted, are common and continue to be a major challenge in some communities;
  • high mobility of populations and several instances of cross-border movement of travellers with infection;
  • several generations of transmission have occurred in the three capital cities of Conakry (Guinea); Monrovia (Liberia); and Freetown (Sierra Leone); and
  • a high number of infections have been identified among health-care workers, highlighting inadequate infection control practices in many facilities.

The statement then goes on to give States with Ebola transmission some practical advice, including that they should:

provide immediate access to emergency financing to initiate and sustain response operations; and ensure all necessary measures are taken to mobilize and remunerate (my emphasis) the necessary health care workforce; meet regularly with affected communities and to make site visits to treatment centres; establish an emergency operation centre to coordinate support across all partners, and across the information, security, finance and other relevant sectors, to ensure efficient and effective implementation and monitoring of comprehensive Ebola control measures. These measures must include infection prevention and control, community awareness, surveillance, accurate laboratory diagnostic testing, contact tracing and monitoring, case management, and communication of timely and accurate information among countries. For all infected and high risks areas, similar mechanisms should be established at the state/province and local levels to ensure close coordination across all levels.

States should ensure that there is a large-scale and sustained effort to fully engage the community – through local, religious and traditional leaders and healers – so communities play a central role in case identification, contact tracing and risk education; the population should be made fully aware of the benefits of early treatment.

It is essential that a strong supply pipeline be established to ensure that sufficient medical commodities, especially personal protective equipment, are available to those who appropriately need them, including health care workers, laboratory technicians, cleaning staff, burial personnel and others that may come in contact with infected persons or contaminated materials.

In areas of intense transmission (e.g. the cross border area of Sierra Leone, Guinea, Liberia), the provision of quality clinical care, and material and psychosocial support for the affected populations should be used as the primary basis for reducing the movement of people, but extraordinary supplemental measures such as quarantine should be used as considered necessary.

States should ensure health care workers receive: adequate security measures for their safety and protection; timely payment of salaries and, as appropriate, hazard pay; and appropriate education and training

States should ensure that: treatment centres and reliable diagnostic laboratories are situated as closely as possible to areas of transmission; that these facilities have adequate numbers of trained staff, and sufficient equipment and supplies relative to the caseload; that sufficient security is provided to ensure both the safety of staff and to minimize the risk of premature removal of patients from treatment centres; and that staff are regularly reminded and monitored to ensure compliance with Infection Prevention and Control.

States should conduct exit screening of all persons at international airports, seaports and major land crossings, for unexplained febrile illness consistent with potential Ebola infection. The exit screening should consist of, at a minimum, a questionnaire, a temperature measurement and, if there is a fever, an assessment of the risk that the fever is caused by EVD. Any person with an illness consistent with EVD should not be allowed to travel unless the travel is part of an appropriate medical evacuation.

There should be no international travel of Ebola contacts or cases, unless the travel is part of an appropriate medical evacuation. To minimize the risk of international spread of EVD:

  • Confirmed cases should immediately be isolated and treated in an Ebola Treatment Centre with no national or international travel until 2 Ebola-specific diagnostic tests conducted at least 48 hours apart are negative;
  • Contacts (which do not include properly protected health workers and laboratory staff who have had no unprotected exposure) should be monitored daily, with restricted national travel and no international travel until 21 days after exposure;
  • Probable and suspect cases should immediately be isolated and their travel should be restricted in accordance with their classification as either a confirmed case or contact.
  • States should ensure funerals and burials are conducted by well-trained personnel, with provision made for the presence of the family and cultural practices, and in accordance with national health regulations, to reduce the risk of Ebola infection.


There is even more detail in the document, but I think you get the drift of it. The World Health Organisation has to be polite and helpful, because that is part of their consensus building remit, but any halfway competent Parish Council in a remote European backwater would be offended to receive such a document, because it makes it clear that the afflicted countries have not been able to organise themselves to give their citizens basic health protection, nor have they managed to get through to them the elementary processes of disease control.

Coincidentally, I have just been reading the draft of an upcoming paper by Heiner Ridermann on the testing of Piaget’s stage of formal operations among a small sample of Germans and Nigerians. Despite the Nigerians being well educated there are very significant gaps of understanding on their part about practical health matters, and a very much higher belief in the efficacy of prayer. Replicating this result on a much larger and more representative sample will be interesting.

Press reports suggest that the official case numbers are a gross underestimate, that many health workers have abandoned their posts, that no attempts have been made to trace the contacts of confirmed cases, and that the bodies of Ebola victims are frequently left unburied. Currently, the Ebola virus seems to be doing very well, all 7 genes of it.

Another way to fiddle exam marks


Years ago I was appointed External Examiner in Psychology at another medical school. I looked at all the papers of students with grades discrepancies between the two internal examiners, and gave my opinion as to what their final marks should be. I also looked at another 20% of the papers to compare those with the list of students with disputed marks, just to calibrate the marking, then looked at the top marked students in detail to agree the prize winners, and looked at the bottom scorers in even more detail to see who should spend the summer re-sitting the Behavioural Science exam. Then I looked at the distribution of the marks, which was fairly tightly distributed round the magic 55% figure, which ensured that most people passed, and thus did not spoil the teacher’s summer by requiring extra teaching to pass the re-sit exam. I think I did a plot of the scores, and planned to discuss widening the range of scores with the internal examiners in Psychology.

All pretty straightforward, but time consuming. Examining is not a well paid occupation, more of a chore than an honour, with some aspects of religious ritual. I felt I had completed my task competently. It was a medical school at which I had taught part of a psychology course for many years. I knew the teachers to be dedicated and enthusiastic. I knew I would have to attend an Examiner’s meeting at which I would make a few suggestions for improvements, commend the course, but would then quickly get back to work at my own medical school.

At this point, having announced that I had completed my task, my colleagues warned me that they and their Psychology course were under attack from other departments, because the Psychology failure rate was seen as unacceptably high. The other traditional courses had failed fewer students, and the occasional severe failure to meet the Psychology standard might lead to the loss of a student who had passed Anatomy and Physiology. By way of background, Psychology and Sociology had been forced on Medical Schools by a Government enquiry, the Todd Report, which sought to ensure that doctors were more patient-focussed, and more aware of psychological issues. This was resented by the older traditional subjects, who hated having lost teaching time to these upstart and probably Marxist intruders. Yes, dear reader, I was part of a revolutionary cadre, overturning the ancient cultures of conceit: the Che Guevara of communicating with patients.

Now look at my first paragraph, and spot my most significant omission. As External Examiner I knew the Psychology syllabus at most medical schools. They were all independent, but all covered the same sorts of key issues, with some slight differences: patient communication, psychology of pain, the placebo response, anxiety, depression etc. I knew at a glance that the exam questions (which I had moderated before they were set) were a fair representation of the course as taught. I knew that the questions were a sub-sample of all the possible questions that could have been set. It was necessary to revise much of the course in order to be sure of getting questions you could answer. Still wonder what’s missing before I can judge the Psychology scores against the other subjects? Look at the sequence in a logical order: assume the Todd Report correctly defined the national standard for Behavioural Science, of which Psychology was part; assume that the Syllabus was fair representation of the national standard; assume that the Exam was a fair representation of the course as taught (not every subject taught will be examined on any particular year); and then assume that the Exam had been properly marked, with two internal examiners working independently, then consulting each other afterwards with their marks, then turning to the External Examiner to resolve differences. Perfect?

If you present experts with a fault tree they tend to believe that it has covered all possible problems (even if about a third of it is missing). Since experts are rational people, and mostly good natured, they tend to have difficulty believing that some people will do stupid, dishonest and malevolent acts.

My Psychology colleagues explained to me that the reason the Physiology exam never had failures, or very very few failures, is that they always did a “revision” teaching session at the end of term, always very well attended, at which they discussed the sorts of topics “which might come up in the exam”. Even the dullest rugger bugger medic could pick up the clues. Passes all round.

Psychology did not do this, either out of honesty or plain innocence, imagining that exams ought to be a test of what students actually knew.

The official Examiner’s meeting was chaired by the Dean of the Medical School. This being London he was already dressed in his legal robes as a Queen’s Council, since he was about to go off to the High Court on another, presumably far more important, matter. As predicted, the Physiologists made their attack: “Psychology, a new subject, is being far too harsh, and is failing students whom we know to be perfectly good future medics, who have done well in our Physiology exam”.

The Dean turned to me slowly, raised an eyebrow, and with infinite politeness said: “Dr Thompson?” I smiled in a manner which I hope could be described as understanding and even sweet, and replied that the overall results on examinations often depended largely on the extent to which the questions could be predicted by students, and that perhaps, just possibly, the Physiology questions were habitually more predictable than the Psychology questions.

The Dean accepted our marks without further discussion, and some students had a summer in which they learned some Psychology. Whether they became better doctors is hard to say.

Thursday, 7 August 2014

Do universities award honest grades?


National scholastic exam results should be honest, firstly because honesty is the best policy in a moral sense; secondly because honest results transmit the most information; thirdly because honest results lead to the best candidates getting the best jobs, which is the best for society; and fourthly because honest results allow scholastic institutions to be evaluated and improved.

None of this is the case for the most important scholastic results: university grades. Each university teaches their own version of each discipline and examines it in their own way. Sure, there are external examiners, and they can sometimes guide an errant institution towards best practice, but it is an uphill struggle.

So, universities have to make up the final degree results. Getting a ranking of the students is the easy bit, particularly because with sufficient observation one can probably work out which are the strongest students in the first term. Then the tricky bit heaves into view. Do we want to be honest about what the students actually know, or do we find it expedient to make them, and the university, look good? Difficult question isn’t it? The answer takes about 5 seconds. The university does not want to admit that they have let in a bunch of dullards, that the teachers are incompetent, the courses misconceived, the exams too easy and the whole institution a refuge for inebriated idiots. However true, it is best not to disclose this to students, parents and grant-giving bodies, let alone to the locals who are wearily familiar with the institution’s many shortcomings. Therefore each department decides upon the level of mendacity required to make them look good as teachers, and to keep their students happy. They set an average grade which makes most students appear good enough, and a substantial minority to appear to be very good indeed. No-one does very badly. All must have prizes.

Students are now consumers. They have consumed industrial quantities of stupifying substances, have avoided most forms of academic enlightenment and effort, yet are entrusted with providing the data on which university teachers will be rewarded and promoted, by giving their hazy recollections about which teachers made them laugh earlier in the term.

For hard pressed university teachers, exam marking is simplified by this artful procedure. In advance they decide on the average grade which will fulfil the above mentioned market requirements. Some better students will be chosen to get marks somewhat above that Platonic average, and a few will get marks a little below it, in a simulacrum of academic judgment. Standard deviations narrow, distributions are skewed toward respectably higher marks, and nasty scenes and confrontations are avoided.

Into this maelstrom of deceit swim Butcher, McEwan and Weerapana, economists from Wellesley College in Massachusetts, to report on what happens when the sloppy inflationist running dog departments at that institution (Spanish, Women’s Studies, Italian, Chinese, Anthropology, Africana Studies, English) are brought into line with the cool and restrained marking of their more honest colleagues (Astronomy, Physics, Mathematics, Geology, Economics, Quantitative Reasoning, Biological Sciences, Chemistry).

The Effects of an Anti-Grade-Inflation Policy at Wellesley College. Journal of Economic Perspectives—Volume 28, Number 3—Summer 2014—Pages 189–204

This paper evaluates an anti-grade-inflation policy that capped most course averages at a B+. The cap was biding for high-grading departments (in the humanities and social sciences) and was not binding for low-grading departments (in economics and sciences), facilitating a difference-in-differences analysis. Professors complied with the policy by reducing compression at the top of the grade distribution. It had little effect on receipt of top honors, but affected receipt of magna cum laude. In departments affected by the cap, the policy expanded racial gaps in grades, reduced enrollments and majors, and lowered student ratings of professors.

The estimated drop in grades in treated departments is smaller for Latina students but much larger than average for black students (including African-Americans and foreign students who self-identify as black), those with low SAT verbal scores, and those with low Quantitative Reasoning scores.

In brief, the accuracy and honesty of the grades improve, though in the long run these departments drift towards their old habits of debasing the currency. Some racial difference increase, but overall the grades become more honest and more informative.

The downside is that Wellesley College students now look bad compared to others from more sloppy institutions. A Wellesley student who gets the controlled and restrained grade average score of 3.3 is inconvenienced in a market place where higher fake scores are the norm. Recruiting departments in desirable companies cannot hope to keep up with precise calculations as to how each university marks their exams. For ease of selection it would make sense for them to rank candidates by grade point average, and then glance at the awarding institutions afterwards.

Of course, if there were an agreed ranking of institutions (based on the SAT grade score averages of the entrants) it would be possible, assuming employers have the interest and the ability to apply the corrections, to create a new national ranking system. Possible, but difficult and time consuming. The authors make a final lament:

Any institution that attempts to deal with grade inflation on its own must consider the possibility of adverse consequences of this unilateral disarmament. At Wellesley College, for example, prospective students, current students, and recent alums all worry that systematically lower grades may disadvantage them relative to students at other institutions when they present their grades to those outside the college. They point to examples of web-based job application systems that will not let them proceed if their GPA is below at 3.5. The economist’s answer that firms relying on poor information to hire are likely to fare poorly and to be poor employers in the long run proves remarkably uncomforting to undergraduates. These concerns lead to pressure to reverse the grade policy. If grade inflation is a systemic problem leading to inefficient allocation of resources, then colleges and universities may wish to consider acting together in response.

It is the tragedy of the commons all over again. Debased institutions confer mendacious advantage to their students and garner resources whilst honest institutions hamper their students in the market place of life. You and I know what the procedure should be: standardise the scores on a national basis, taking into consideration student grade point totals on key subjects prior to university entrance as a way of grading the institutions and correcting for institution heterogeneity. The current system is measuring students with a rubber ruler, not with a platinum meter in a vault (or more precisely the path travelled by light in vacuum during a time interval of 1/299 792 458 of a second). Someone from on high has to force change and bring the degree counterfeiters into line. Some modern day Newton, as Master of the Mint of Graduates.

Until then, university grades are primarily a moral issue.

Wednesday, 6 August 2014

CCACE 7th Annual Research Day


Edinburgh is the city that Athens could have been if only it had a constant supply of drizzle. Faced with an inclement Northern climate, the disputative people on this rocky outcrop by the North Sea have thrown cantankerous thunderbolts at all and sundry, though mostly South. This little conference put together by the Deary gang may be your last chance to see Edinburgh before it casts off the surly bonds of English sovereignty and sails into glorious independence, penury and clan warfare, bewildered upon a peak in Darien. (It was their last disastrous venture in Central America which cast the bankrupted Scots into the hands of their ancient enemies).

Here is the pre-Independence bill of fare:

Wednesday 27th August 2014

Room F21, Department of Psychology

7 George Square, Edinburgh EH8 9JZ

Draft Programme

12.00 noon Lunch and poster session (lower concourse)

1.00pm Welcome from CCACE Director, Professor Ian Deary

1.15pm Keynote presentation

Title: “The ENIGMA Project: Investigating Brain Diseases with Imaging and Genetics in 29,000 People”

Professor Paul Thompson, Brain Research Institute, UCLA

2.15pmLa Nouvelle Vague”: Some recent large research programmes

European Prevention of Alzheimer’s Dementia Consortium, Craig Ritchie

“STratifying Resilience and Depression Longitudinally (STRADL) – Wellcome Trust strategic award”, Andrew McIntosh

"Development of a software application for detection and monitoring of attentional deficits in delirium" (MRC Developmental Pathway Funding Scheme), Zoe Tieges

2.45pm CCACE PhD student talk

“Life after CCACE: What was my PhD about, what did it give me, and what did I do with it?”, Donald Lyall

3.00pm Tea/coffee break (lower concourse)

CCACE’s Research Groups

3.15pm Premier of Anne Milne’s “The Living Brain”, a Lothian Birth Cohort film

3.30pm Cognitive epidemiology

“Psychological distress as a risk factor for death from a variety of causes”, Tom Russ

3.45pm Human cognitive ageing: Individual differences

“Is the world too fast when we’re slowing down?”, Stuart Ritchie

4.00pm Human cognitive ageing: Human cognitive neuroscience

Three short talks from current PhD students

4.00-4-04 - “The effect of funding sources on donepezil randomised controlled trials”, Lewis Killin

4.04-4.08 - “Cognitive advantage in bilingualism: an example of publication bias”, Angela de Bruin

4.08-4.12 - "Autobiographical thinking interferes with episodic memory consolidation", Michael Craig

4.12-4.15 - Discussion on HCN presentations

4.15pm Mechanisms of cognitive ageing

“Delirium and cognitive decline: What is the pathological basis?”, Daniel Davis

4.30pm Genetics and statistics of brain ageing

“DNA methylation and aging in the Lothian Birth Cohorts”, Riccardo Marioni

4.45pm Human and animal brain imaging

“From birth to old age: New imaging of brain ageing”, David Dickie

5.00pm Closing remarks and invitation to drinks reception, Professor Ian Deary

5.10pm Drinks reception (lower concourse)