The mind’s construction in a face

Of course, this is only a bit of fun, but presumably you can tell how bright a person is just by looking at them? Such confident judgments are anathema to proper clinical psychologists, who would rather spend an hour giving a Wechsler intelligence test than stoop to such populist nonsense.

Now Karel Kleisner  Veronika Chvátalová, and Jaroslav Flegr have decided to put this silly stereotype to the test in a PLOS One paper “Perceived Intelligence Is Associated with Measured Intelligence in Men but Not Women” and find it not so silly, at least as far as men’s faces are concerned. (As my readers already know, a stereotype is an insight waiting to be proved.) Perhaps the girls are so exclusively judged on prettiness  that their intellectual countenances are ignored, whilst boy’s faces can be judged for both intellectual and sexual purposes.

Let us get the criticisms in quickly. The sample size is small (n=160), and more importantly the faces are from university students and the raters are also university students (mean IQ 125 sd 17). This could be a case of bright people recognising other bright people. There is a restriction of range problem, and the authors should try a representative sample of faces and raters, and are likely to get better results.

I presume no-one was photographed with glasses on, though they “avoided cosmetics, jewellery and other decorations”. On the plus side, they have published their entire data set.

In general, I like the way the authors have presented this paper. They admit that their method of analysing the composition of faces shows no relation to measured IQ, yet that there must be something about the pictures of the men’s faces which allows the positive predictions of their intelligence to be generated. The authors say that this must be due to a cultural stereotype. Weak argument. Where on earth would such a “stereotype” come from? If cultural stereotypes mean anything they would be random, and have low predictive power. This reliance on the notion of “cultural stereotype” is a crucial misunderstanding on their part, because it does not explain how a correct stereotype comes about, other than by someone noticing something which is true.

Their line of best quadratic fit I found something of a disappointment. Above IQ 140 the strength of the prediction falls considerably, and these paragons of intellect are seen as pretty stupid. In statistical terms these outliers are freaks, so in evolutionary terms it might not be worth detecting them. Or they carry so much mutation load that they look awful.

In both sexes, a narrower face with a thinner chin and a larger prolonged nose characterizes the predicted stereotype of high-intelligence, while a rather oval and broader face with a massive chin and a smallish nose characterizes the prediction of low-intelligence.

Do you have a gracile face? For once in my life my larger nose seems to be a benefit in generating a positive stereotype about me.  Do you look like the clever person you actually are, deep inside? If you wish to comment, please append a photograph. If you are over IQ 140 you may omit the photo.

Rock and Roll


Lest it be thought I lead too quiet a life dwelling on the minutiae  of psychometry, I spent most of yesterday partying with rockers in a secret London location as the guest of Richard Thompson OBE (no relation) whose gig at the Half Moon on 4 April has sold out in 10 minutes. Since Fairport Convention he has achieved fame as a solo artist and has an extensive and extremely loyal fan base. His songs have been covered by everyone in the business, and I have witnessed adoring fans standing in the pouring rain at Fairport’s Cropredy Convention in Oxfordshire, hanging on every word and chord.

In a respectful obeisance to rock tradition, he wore blue suede shoes (“Crepe, actually” the great man said).


Also there was Jo “Nashville Rock with an English Accent” Burt who played with Black Sabbath, and whose wife and backing singer Antonia talked about their new album, suggesting “The Mess” as the track which would be of most psychological interest to my readers. The picture shows Antonia, with Richard Thompson in the background and Jo Burt, looking away to his left.


Third up, a new duo launching their latest album next month, but once again for some reason my photograph shows the female part of the combination with somewhat greater emphasis than her truncated male partner.


Following rock tradition, the rest of the party becomes a bit hazy, so the names have become somewhat jumbled, and I have run out of links. I am told that all of us danced to the classic tracks. There was a whole lot of shaking going on. You will have to help me recall the rest of it. Meet on the ledge.

#MH370 Reincarnation and sea junk

It would appear that, despite collecting data for several decades, we do not have baseline estimates for sea junk per area of ocean. Our watery world is crisscrossed by a conveyor belt of ships carrying container loads of materials, a portion of which fall into the water, joining the rubbish deliberately thrown overboard from ships and the rubbish which makes its way into the oceans from the stuff we throw into rivers and leave on beaches.

Baseline measures aren’t sexy. One unintended consequence of the search for flight MH370 is that we will have learnt that even the far reaches of the southern Indian Ocean, deep in the roaring forties, have generous scatterings of man made rubbish. Perhaps we will even be able to quantify this in terms of number of discriminable man made objects per thousand square miles.

Note that, if the number is very low (and the more appropriate measure turns out to be objects per ten thousand square miles, or even a hundred thousand square miles, which is a little over the size of the United Kingdom) this would strengthen the significance of finding any object floating in the ocean. Signal detection would become a little easier. We could argue, as the Malaysian government officials have done (they are not having a good time, are they?) that floating junk means floating plane junk. Find some junk and the plane will be on a sea bed somewhere upstream from the sea currents, if those can be calculated with any degree of precision.

On the other hand, if the number of floating objects is high, then the task becomes even more difficult, and pushes us towards the next problem: can crashed plane junk be discriminated by satellite or observer plane from all the other junk or do we have to rely on retrieval by ship of every likely floating object?

This question came into my mind a few minutes ago, when the revered BBC website displayed a picture taken by a journalist from a New Zealand plane showing a white floating rectangle. “I am no expert” as people say in the Twitter-sphere, (before launching into an elaborate speculation) but it is not immediately apparent to me how this object potentially relates to a crashed airliner. It is very probably nothing to do with the skin of the plane, nor does it look like any inner section, or any type of cargo. However, it is man-made, and floating.

So, how are our probability estimates looking at the moment? It seems that the range of the aircraft is fuzzier than previously disclosed. The plane was traveling faster than previously envisaged, thus burning more fuel, and therefore travelling less deeply into the southern wastes. If one plots out the error arc of the Inmarsat calculations, and the error range of the speed and fuel calculations, quite a chunk of ocean remains in contention. (I do not know how much, and wonder if anyone else does).

So, which way would you gamble, using Bayesian techniques? Three main components to be factored in are as follows: 1) area to be searched (ranging from the highly probable to the less probable) 2) the time left before the black box pinger battery gives up, and 3) search efficiency.

My rough calculations would be that: the search area remains too large; the pinger will fade to almost nothing in another 15 days (though cold water might extend that time) and search efficiency is extremely low. This latter point was well studied by the mathematician Bernard Osgood Koopman who wrote the first proper handbook for searches at sea in the 1940s. Looking at the sea is boring, you cannot scan the whole area so it is best to look a little down from the horizon, you should change places every 15 minutes to lift your alertness, you should start with probable places and move to slightly less probable place but ignore possible but improbable place (success is unlikely when probability of a target is low, and visual search is inefficient) and no plane can fly for ever.

My bet would be that, absent any more refinement in the calculation of impact location and subsequent drift, the searcher must gamble, and should maximise the area that can be searched. The area closer to Perth maximise the proportion of the target area that can be searched. Look where the light is brightest, particularly when the light is about to go out.

And finally, a word about reincarnation. About forty years ago I read somewhere, possibly in the Pali cannon of sayings of the Buddha or a commentary a line about the chance of someone being born without having been reincarnated. The chance was rated as being “as low the chance that a turtle that rises to the surface once in a thousand years will put its head through a life belt cast upon the Indian ocean”.

Can someone look it up for me? I am busy searching for a missing plane.

#MH370 : public and private understanding


Sitting in TV studio waiting rooms is a good way to meet and listen to experts with technical knowledge. Some days ago I had referred to the missing Malaysian airliner as posing us the ultimate IQ test. It now seems the test was solved in a few days, at least as regards probable location, if not probable motivation.

The Inmarsat story is a very interesting one, and is slowly being disclosed. As of this early morning the account was that there were 7 hourly “pings” to serve as the data points. Now it turns out that there was an 8th incomplete ping following shortly after the 7th, about 10 minutes later, and not at the usual hourly interval. This may have been an “exception report” coinciding with the moment of impact.

The early story was that, given these 7 hourly pings, Inmarsat was able to work out very quickly that they were consistent with transmissions coming from somewhere on an arc running North to South roughly from the point of the last radar contact with the plane. The presumption was that the satellite in geostationary orbit could calculate a possible arc from which the transmissions might have been sent, but no more than that. The increasing delays of transmission from plane to satellite might have been due to the plane travelling north or equally, south. How to resolve this directional issue?

Here is the Malaysian government’s explanation at to how Inmarsat did this:

In recent days Inmarsat developed an innovative technique which considers the velocity of the aircraft relative to the satellite. Depending on this relative movement, the frequency received and transmitted will differ from its normal value, in much the same way that the sound of a passing car changes as it approaches and passes by. This is called the Doppler effect. The Inmarsat technique analyses the difference between the frequency that the ground station expects to receive and that actually measured. This difference is the result of the Doppler effect and is known as the Burst Frequency Offset.

The Burst Frequency Offset changes depending on the location of the aircraft on an arc of possible positions, its direction of travel, and its speed. In order to establish confidence in its theory, Inmarsat checked its predictions using information obtained from six other B777 aircraft flying on the same day in various directions. There was good agreement.

While on the ground at Kuala Lumpur airport, and during the early stage of the flight, MH370 transmitted several messages. At this stage the location of the aircraft and the satellite were known, so it was possible to calculate system characteristics for the aircraft, satellite, and ground station.

During the flight the ground station logged the transmitted and received pulse frequencies at each handshake. Knowing the system characteristics and position of the satellite it was possible, considering aircraft performance, to determine where on each arc the calculated burst frequency offset fit best.

The analysis showed poor correlation with the Northern corridor, but good correlation with the Southern corridor, and depending on the ground speed of the aircraft it was then possible to estimate positions at 0011 UTC, at which the last complete handshake took place.

Burst frequency analysis is apparently well known, but to make offset calculations on Doppler like effects so as to infer location is an innovation. It would seem that when tested on real plane data these Doppler effect calculations matched the Southern arc better than the Northern arc. I still need to go through some further steps of understanding, but it seems very neat work, done very quickly. Once the most likely corridor of the flight path had been worked out, then calculations could be made on the fuel range of the aircraft so as to plot a likely impact location where both calculated ranges transected. Dropping sonar buoys in that area might pick up the last feeble transmissions from the black box. Finding debris will be another matter, and finding the wreck itself with the much prized black box even more problematical, and possibly not feasible.

Real time reporting of black box type data from aircraft is likely to be made mandatory, and eventually the black box will be as redundant as a library.

What these calculations show is that a minority can deal with probabilistic hypotheses based on statistical and scientific concepts, and usually those calculations are relatively private; and a majority would like to see publicly testable, absolutely tangible proofs, in the form of bodies and wreckage.

The private discussion has to be impersonal, detached, open to improvement and criticism, and rigorous when searching for errors. Reportedly, Inmarsat researchers eventually realised that a satellite in geostationary orbit is not in fact totally stationary, and by correcting for some drift were able to refine their location estimates. According to some accounts it was this correction which revealed the southern arc as the stronger hypothesis.

The grieving relatives want certainty, and want it in public. Scientists can only offer probability estimates, with the detailed results in private, or in that strange space, academic publication, where you have to know quite a bit about the subject in order to evaluate the paper, and can only discuss it with a few other people.

To cap it all, also not disclosed, quite properly, are the ways in which each airline deals with hijackers/terrorists. For example, it would not be good security to have it generally known how airlines intend to deal with hijackers, what distress codes they have put in place, how the cabin crew communicate with the pilots, and whether or not they can open the locked door to the cockpit, let alone why jets were not scrambled to shoot down the missing plane. Knowing this would aid public understanding. Keeping it private would assist security.

The Malaysian authorities released the wrong message. They announced certainty without even a shred of a wet paper napkin with the Malaysian Airways logo on it.  Private calculation understood by the few trumping the public bewilderment of the many.

They should have said something like: “The search continues for the plane and the passengers. All indications are that it came down in the far south of the Indian Ocean. It is very unlikely that there are any survivors. We are still searching for wreckage of the aircraft”.

All though it is unlikely now, one hopes that one day the relatives may rest in peace, free of the anxious torment of sweet dreams cruelly dashed.


#MH370 and Goodness of Fit

Goodness of fit is an alluring concept. First, it is a good name for a statistical procedure, suggesting that one is on the side of the angels, and above all the devilish tricks of statisticians. Second, it describes how well a set of observations fits a theoretical model. The smaller the discrepancy between the observed values and the expected, model values, the better the fit.

Question is, fit with what? Chi square simply tells you the extent to which a particular frequency of observed values fits what would be expected from, usually, a chance model. It depends on a model, which in turn depends on a set of assumptions. In simple cases the chance model is fine. In more complicated cases the expected frequencies are somewhat harder to calculate.

However,  that is by no means the main problem. Non-statisticians use a different heuristic, and count the number of points of concordance between a narrative and a set of observations. It is goodness of fit only in the sense of a comfortable and convincing similarity. Chi square it ain’t.

For example, in buying a car one might make a list of desirable characteristics, and then measure the extent to which each car “ticks the right boxes”. Car manufacturers know this, so they construct the list for you, and then reveal the perfect fit: “Our car has doors, and you wanted doors, so 1 point to us” and so on. Fitting the facts to a narrative often follows a similar, self-serving, confirmation bias. People tend to count the points of concordance without looking at individual probabilities.

In trying to make sense of the mysterious loss of Malaysian Airlines MH370 many people want to start with the narrative. In the well known example of a simple explanation: if there had been a fire on board, and if that fire damaged communication systems, and if the the pilots had set a course to the nearest safe airport, then they might have been overcome by fumes and carried on flying in the same automatic Westerly direction until either the fire consumed the plane, or the fuel ran out. Most convincingly, the map of the safe airport showed it had the characteristics the narrative had required: an approach over water to a particularly long airstrip on which you would wish to land if you were on fire. Spooky.

Of course, it is better to assemble the facts first, and display them with their error terms. Some of those basic facts, and the error terms, have been difficult to track down. Not all of them. Inmarsat plotted two possible trajectories, with associated error bands, which appear to give objective guidance, even though they are based on very new types of inference. More confusing were the timelines of events, and now even the cockpit to control tower conversations, some of which have reportedly been challenged because of translation errors. The Malaysian Government read out an urgent note about apparent debris spotted by a Chinese satellite, but gave very large dimensions which turned out to be wrong.

The Bayesian approach would be to look at each of the assumptions in terms of  probabilities, and then establish confidence limits for those probability estimates. The chain of assumptions contain some that can vary semi-independently, and others which appear to cancel each other out. If a plane has crashed somewhere, it is not likely to keep “pinging” in a way that can be received by an antenna on a satellite, since the capacity to “ping” depends on having an intact system with a power supply.

Equally, if a camera on a satellite shows something floating on the sea (or just under the surface) then the significant of that firstly depends upon the error rate of interpretation (that the signal relates to something solid, and not a pattern of waves) and more importantly, a probabilistic judgment about whether the something is likely to be part of a plane or a container, pallet, or glutinous conglomeration of pallets, plastic bags, bottles, trainers and yellow plastic ducks (all of which have polluted the oceans and improved our detailed knowledge of sea currents).

Doctors face these sorts of dilemmas every time they cannot reach a diagnosis. They generally ask for more tests, which may illuminate or delay the decision. Privately, they often use a frequency table, on the basis that frequent things occur frequently, and that is the best and most defensible guide to action.

Consider the following data from a Flight Global showing why planes have been lost during level flight, which is usually the safest part of a plane journey.  Sabotage 13, Loss of Control 8, Airframe 8, Explosion or Fire 4, Collision 4, Hijack 2, Ditching 1, Power Loss 1, Shot down 1, Unknown 4 (includes MH370).

In terms of prior probability you would go for sabotage as the primary suspect, followed by loss of control or airframe. Given that no debris was found at the point of last transmission, or nearby, that means that there was no bomb, no loss of control, no airframe disintegration, no explosion, probably no fire, no collision (pretty sure of that), no ditching, power loss or shot down (unless there is one hell of a cover up).

Looks like hijack, in the sense of hijack by pilot for reasons unknown. However, out of  45 planes falling down from level flight before this one, hijack accounted for 2 and unknown for 3. So, looks like unknown, possibly hijack.  Time to send some satellites to look at the debris to the West of the Maldives, just as a control case.

Finally, at what level of cost will governments begin to lose interest? I predict by the 35th day after the disappearance, when the black box pinger stops, everything will be scaled back, and the searchers will return to the statistics lab, until a bit of the tailplane shows up years later in a fishing net.

MH370 Intelligence to the rescue?


In the exercise of their intelligence, a large number of citizens have decided to try to find the missing Malaysia Airlines MH370 flight. According to the Wisdom of Crowds, if a sufficiently large number of people guess the longitude and latitude of this vehicle and its unfortunate occupants, then we should be able to refine our search of the vast immensities of the world that conceivably could have been covered by this fully fuelled jet.

The wisdom of crowds, as you may remember, posits the view that because the pooled estimates generated by individuals guessing the number of beans in a glass jar are often close to the real number of beans, that there is a mysterious force guiding us to the correct solution, sadly hidden from individuals. Of course, readers of this journal may suspect that averaging estimates on tasks with low intellectual content can sometimes reduce error terms, particularly if you omit outliers, but this is no time to get nasty with authors of popular books.

Typically, humans being humans, these disparate citizens have eschewed the control condition of just guessing the location, as the jar of beans example requires. They have taken to poring over maps and searching satellite photos and referring to Google Earth. We are all searching, each in our own very special way.

I am in favour of speculation, and would not to interfere with anyone’s hobby, even if it involves flight simulators.  There has been little need of heavy labour since the domestication of wheat, so people have to find things to do, myself included.

Better, this understandable wish to help solve a mystery has brought in some experienced pilots, with much to add in explaining the Pilot Point of View. Naturally pilots tend to stick up for pilots, but plane manufacturers stick up for plane manufacturers, and governments….. you get the drift. Chris Goodfellow, pilot, has put forward an interesting speculation, which is that a fire in the aircraft forced the pilots to change course to a nearby airport. The pilots, unknown to them, had already lost communication links, but set the autopilot toward the nearest and most suitable airport. Overcome with fumes and smoke the crew collapsed, but the plane continued in a straight course on autopilot until it ran out of fuel, somewhere just West of the Maldives, which is the area which should searched, in his opinion.

There is much to like in the account. First, it is written by a pilot. They get to wear impressive uniforms. Second, the hypothesis is testable, in that it gives a location (roughly). Third, it is testable in that it suggests that, should the plane ever be found, fire damage will be found in some of the systems. Fourth, testable in that every body will probably be still in its seat, asphyxiated. There will be no piles of bodies relating to crowds trying to break down the cockpit door. Fifthly, the black box will show one change of course, and nothing else. Sixthly, it teaches us something about pilots, and they are worth learning about.

Let me describe this in a little more detail. As an 11 year old I spent a portion of every airliner flight in the cockpit. I did at least one landing in the company of my younger brother standing next to pilots as they landed at Carrasco airport, Montevideo. I recall that, in a late burst of health and safety awareness, in the final stages of the approach one pilot muttered as he coaxed the bucketing DC3 downward “Grab on to something”. One learnt a lot in those days.

In later years I always talked to the pilot. On Concorde the pilots concluded their 10 minute explanations about the controls, instrumentation, flying characteristics, thermal properties of aluminium alloys, and the intricacies of altering the centre of gravity to vary the angle of attack by ending on a studied, laconic note, describing what was one of the world’s fastest-ever aircraft thus: “Its a good bit of kit”.

More ordinary planes had pilots who sometimes lamented their reduced condition. On a long flight to Tokyo one said to me “I am not a pilot, I am a systems engineer. Systems monitor, in fact”. He showed me how he was pumping fuel into the wingtip tanks to reduce stresses on the wings. He also showed me how he was monitoring “every airport that can take us”. This had not been a Latin American concern decades before. Sure enough, every suitable airport was on the moving map, with coordinates and runway characteristic available should we need to land quickly. A very worthwhile precaution.

By the way, most of these cockpit visits, though they linger in my mind, were very short. Pilots kept working, and some of the time I just stood there in silence, watching. The later flights were all in the hijacking era, but not in the hijack and suicide era. Different times.

However, there is a problem with all these all attempted explanations. We are all relying on the belief that we know the actual sequence of events in the first hour or so of the flight. I suppose it is conceivable that a fire somewhere in the airframe should have eaten its way through one automatic reporting system while leaving the radio intact. The timing of the loss of systems becomes absolutely critical in this analysis, as indeed in all of the analyses. Is there a sequence of events which is agreed, and trustworthy? Not quite.

Last contact was said to be at 1.30 am local time, and a nearby plane, contacting MH370 just after that to remind it to call into Vietnamese air control only heard static and some mumbling. However, as of today, the revised timeline is apparently as follows.

1:07 - ACARS ping (from automated system)
1:19 - “All right” (said by one of the pilots)
1:22 - Transponder quits (quits, which doesn’t mean it was switched off deliberately)
1:37 - no ACARS ping (just that, no ping, make of that what you will).

So, some of the times have changed, and the sequence looks different. However, until the precise sequence is confirmed speculation will only lead to infinite confusion. If a transponder quits, then that is all one can say, unless one has a means of distinguishing between deliberate and accidental causes of a transponder quitting.

It looks as if we have a communications problem somewhere in Malaysia. Fronted by the government minister for Defence and for Transport Hishammuddin Hussein, by background a lawyer from a political family, the authorities seem to be juggling their own national data (from radar); with the data reported or not reported from the radar systems in other countries; with the insights from Boeing; with the insights from American investigators, including the FBI and intelligence agencies; with insights from French investigators (see Air France 447); with data from Inmarsat; with with the understandable wish to look good in the eyes of their electorate and the wider world. This cautious lawyerly approach need not be sinister, and can of itself avoid fanning rumour, but in conjunction with technical matters which need explanation it seems to have lead to some avoidable confusion. Also, all the above groups sometimes give their opinions “off the record” to trusted journalists. Confusion squared.

The arrival of bereaved and angry relatives was to be expected. They are grieving without bodies, always very hard, and coping with a confusing story which keeps their hopes alive and makes their rage at fate turn into rage at these halting communicators of a dreadful mystery, which requires politicians to evaluate and communicate some very technical scientific details, of which probability, inference, and error terms are an important part.

Meanwhile, in the light relief section of the Press,  The Blonde has surfaced in Australia, saying that she had no idea that spending the entire flight with another blonde friend in the cockpit chatting with the male pilots constituted any sort of procedural risk. 

It may be time for everyone to get out of the way and read Arthur Conan Doyle’s short story “The Lost Special” about a train that disappears. Here is the link:

In that little story is the line which has become very well known: “"It is one of the elementary principles of practical reasoning that when the impossible has been eliminated the residuum, HOWEVER IMPROBABLE, must contain the truth.”

Let’s hope those poring over the raw data will be able to eliminate the impossible, and then explore the residuum.

Regular guys: the pilots on flight MH370


I do not think I am the only one, but my working life has been distracted by a missing plane. Last Friday night I turned down a chance to be on TV talking about it. I had an alibi: pressing work, work that presses on me still. Yet here I am.

The first news about the missing plane seemed to lead to a straightforward interpretation: a bomb or missile or massive structural collapse had caused a plane to be blown out of the sky. Modern aeroplanes do not fall out of the sky, or certainly not at any perceptible frequency. Debris would be found soon, I believed.

The distraught relatives waiting for a “delayed” plane is a modern tragedy: the incomprehensible delay, the cruel uncertainty, the lack of bodies, the crowd of stunned strangers bereft by the loss of the parallel strangers who sat next to each other on the plane. And the Press, on our behalf, asking them pointless and intrusive questions, and recording their inchoate grief and rising anger.

Then, in slow motion, I watched another tragedy unfold: the gruesome detective story, into which I fell. Three levels: the government story (slow, measured, sometimes evasive); the technical story (faster, far more detailed, sometimes contradictory) and the media story (wild, speculative, interesting, informative and some of it right).

Speculation is what we are told we should not do. We should wait for facts. However, speculating is part of being intelligent. Indeed, it is one of its core features. A predator passes behind an obstruction and we speculate which side of the rock they will come out again. Getting the prediction right helps keep us alive. Speculating is what leads us to find out how things work. Puzzles intrigue us because we are curious. We see it as our business to seek for an answer, and begin to distrust the answers we are given. Good. The Enlightenment continues.

In a different sort of way, speculating is what the accident investigators do (systematically, based on previous cases and industry knowledge, and bound by the laws of physics). Speculation throws up a lot of rubbish, but also creates scenarios which can be examined and rejected, or which serve as a starting point for new possible explanations. The trick is to speculate and then evaluate. The scientific procedure, no less. Warmly welcoming hypotheses, and then coolly dismissing those that don’t make the grade.

Another feature which assists speculation is that the Press often have better contacts. They get past the official story, and on to the real stuff. The “received and official” pronouncements leave out the rough edges of reality. For example, the exemplary young pilot was attracted by pretty blondes. In this joined up age the blondes provided photographic proof of their flirtation with him in the cockpit. Incidentally, will they be arrested for distracting a pilot from his duties? The older pilot posted on YouTube, and had political opinions. Worse, he is a geek with a home-built flight simulator.

Is a having a home simulator prima facie evidence of mental disorder? James Reason, of Human Error, might argue that simulators assist the corruption of reality. Simulators were probably involved in the worst ever airplane crash in Tenerife in 1977. The most experienced KLM pilot, with most simulator hours, started his plane too soon, before getting permission to take off.  He crashed into a taxi-ing plane slowly clearing the fog bound runway, and 583 people died. He was used to ignoring those unnecessary control tower delays so as to save time when training young pilots in the expensive simulator.

A home simulator might encourage fantasies of being a fighter pilot and of flying fast over hilly terrain, avoiding enemy radars. It might allow the rehearsal of landings in far away airports, off the Malaysian Airways beaten track. It would add a layer of extra skill to even a skilled pilot, who could attempt manoeuvres never allowed in civilian flight. Did the more experienced and older pilot play so many combat fighter games that, at a personal moment of anger or despair, he wanted to try them out for real? Or will it turn out to be no more than a hobby? I see it as a bit more than a mild obsession.

The official story about the pilots was reassuring. After all, they are Malaysians, and work for Malaysia Airlines. All flag carrying airlines carry an extra burden of national pride. If the airline was called Anyplace Airlines, fewer feathers would have been ruffled. That aside, a large number of friends attested to the essential normality of both pilots. The older one was portrayed as an altruistic man, and his political opinions nothing out of the ordinary; the younger man, blondes aside, mild mannered and about to get married to a long time pilot girlfriend. So far, no smoking gun has been found that would distinguish these two from other airline pilots in Malaysia. However, neighbours of major criminals often find they have nothing much to say about the perpetrator beyond “He kept himself to himself”. What may lurk beneath the calm exterior, etc ad nauseam? On the contrary, these guys seem social, engaged, normal.

And yet, after days of confusion, it is belatedly clear that the events seem due to deliberate action by a pilot of some sort: the older one, the younger one or another hypothetical one. My control tower informant (not in Malaysia) says: “You put your faith in the guys in the cockpit. Malaysian Airlines is deeply in debt. They just bought A380s (6), so they are even more in debt. Perhaps they were starting a redundancy programme, and that got to one of the pilots”. He was clear that the pilots were involved, even if he could not be sure of their motives. He told me to read some pilot websites like Flight Global.

The whole story is a festival of d prime, ROC curves, and bewildering noise. Inmarsat obviously should have run the satellite data for the previous few days to refine their interpretation of the MH370 handshake signals (and probably they have done just that). Is it another case of epi-genetics? Fluff on the toffee that tells us more than the toffee can? It certainly presents us with a very interesting puzzle in Bayesian terms: what can we deduce from the lack of an answer, when that sliver of non-response gives you the angle the satellite was at when it did not receive its expected answer, but only the merest blip of a non-answer. Codes are broken this way.

The other problem is the assumption that there can be only one narrative. For example, a terrorist is assumed to have only one narrative, which is to complete the murderous mission. In fact, a proportion of them change their mind and give up, either well before or during the early stages. They dump their bombs somewhere, and go off to have a coffee like everyone else.

A “deranged” pilot might start out tentatively, and then get deeper into his dilemma. One narrative might take over from another. A wild prank, a small protest might lead on to dawning shame, depression and suicide. Perhaps our putative deranged pilot could not face the shame of coming back to admit he strayed into being a boy racer, angry with the government, who had diverted a plane as a protest. One pilot may have asked the other to go out and look at the wings or something, and then been safe in his armoured cockpit. A wannabe pilot may have talked her way in. Perhaps. All this seems fanciful, but possible.

I was once helpless in a large commercial plane with a deranged pilot. He ignored standard procedure, and flew his 200 passengers slow and low over the beaches of Uruguay, to general surprise and some amusement. The opinion expressed by those near me was that he was trying to impress his girlfriend on one of the beaches. It was a Spanish airline, after all. Cruise liners have been lost for less. The other opinion is that he had recently bought land for a beach home, and was trying to find the damn thing from the air. After half an hour or so, he tired of the tour, and we rose up to normal altitudes, appreciative of our little scenic outing.

Despite pressing deadlines, I took a short break on Saturday to go to lunch with friends. For no particular reason I called the very talkative luncheon company to silence to set my novelist hostess a challenge: that she should write a novel which began with a wife saying to her husband that he was useless and had never done anything of interest, only for him to reply “I am not going to stay here arguing. I have a plane to fly”.

I assure you that I do not usually see novels as a basic ingredient of accident investigation, but in fact the scenarios that accident investigators must examine often contain sequences which would look unbelievable in a novel.  We have to speculate, because one of our speculations might make us examine new possibilities. Might. The speculations about Air France 447 was only confirmed when the black box was found, though the basic outlines were there. Snapshot: a pilot was scared by an electrical storm and the absence of speed indicators, and pulled up the controls, causing a fatal stall which, because of the nature of the controls, was invisible to the returning captain who rushed into the cockpit to help.

So, what really happened to flight MH370? Now, there’s an intelligence test item.

An awesome response to paradigm shift

Neuroskeptic asks whether there is an ordering of hyperbolic statements, and Matthew Hankins notes the popularity of “paradigm shift”.

Is the hyperbolic inversely related to fatuity?





Does peer review give too much power to malcontents?


Peer review now has sacrosanct status. It is seen as equivalent to quality control in licenced medicine: a guarantee that the product will do you no harm, and that it may very probably do you a lot of good. It is sold as the gold standard, separating the precious metal from the dross, ensuring that everything which goes through the review process is of the highest standard.

This perspective is beloved of academic publishers. whose authors write for nothing (indeed, they are indentured labourers in academia) and whose reviewers review for nothing, and then the publications are sold for extortionate sums. $35 for one academic paper? You could buy a meal, a newspaper, a magazine, a romantic novel and still have change for several coffees.

Worse, anonymous reviewers can exercise power without responsibility: “the prerogative of the harlot throughout the ages”. They can bitch, spit, claw and slash, till the original work is in tatters. The supplicant, seeking promotion or mere survival, concedes all, and puts his name to a paper the reviewers have written for him, making him say things he does not believe, and commonly, cannot stomach. As the published papers accumulate he advances up the academic ladder, and looks forward to getting his revenge, either on his reviewers if he has found out who they are, or on his worst rivals, the bright young things snapping at his heels. The cycle of disparagement and suppression of contrary imaginations continues.

It is not all bad news. Some papers are rightly rejected; many are improved; some reviewers are kind-hearted, encouraging, helpful; it is even possible that some of the standard expressions of authorial gratitude to nameless reviewers are heartfelt. Anonymous review encourages honesty as well as spite. Sharp criticism may lead to great scholarly effort. It may also lead to some authors taking up farming, to the great benefit of academia, if not always to farming.

However, there is a quicker way to do all this. The authors could circulate their paper to friends, and incorporate some of their suggestions. They they could post it up on an open access website and invite reviews, thus getting several different public perspectives. It would be a more open and complete procedure. It would also be much faster. It would still be peer review, but with accountability and with far better metrics. The reviewers would be able to build up a profile: fair minded/usually fair minded/harsh/poisonous. Reviews could be counted, assessed for quality as above, and counted towards academic output in an open way. The way authors struggle to deal with criticisms could also be seen in an open way. Above all, no author could ever complain that one of their ideas was strangled at birth because of the psychopathy of a few anonymous critics.

This posting was not peer reviewed. Would you like to do so now?

Festschrift and time shift

Yesterday afternoon, to the Old Refectory at UCL to attend a celebration of the work of Prof Graham Scambler, a long time colleague and friend. Five lectures on social theory and health, with an audience composed of fellow sociologists, former students, and his family and grandchildren.

The pleasures of academia come from discovery and influence: finding out new things; hearing from a student that they were inspired by a lecture or book; noting that a paper written long ago still has some impact; or that a new journal has finally established an academic niche. In academia such feedback is often much delayed, partly by publication processes which may run to a year and partly by the slow rate at which new publications find their way into student textbooks. At this particular celebration the former students who gave talks had achieved professorial rank, but still remembered their origins and their path to increased understanding. Although several joked about his vast library, only one speaker mentioned that Graham was primarily an intellectual,  but the dread word passed quickly without causing embarrassment to English sensibilities. In a sense all his students had been drawn in to his ambit by a single finding from his PhD thesis, which was that the social and personal impact of being diagnosed as epileptic was often greater than the medical severity of the condition.

Reflecting on the talks, Graham noted that he had somewhat marred the event by not yet being dead. He spoke about a central dilemma of sociology, which is the tension between investigating social forces and trying to change them. Psychology does not lack applied practitioners, but sociology is awkwardly poised between those who advise governments (at least one speaker was involved in health policy and grant allocation) and those, like Graham, who between publications want to man the barricades.

As a coda, as the speakers gave their accounts, 40 years of academic life flashed by: the realisation in the early 70’s that traditional medical education had many shortcomings from the patient point of view, leading eventually to the sudden recruiting of sociologists and psychologists to try to make a difference; all this sullenly accepted by medical schools who doubted that the experiment would work, and resented the reductions in their teaching hours. The students were almost uniformly male, and thought of medicine as a refined form of rugby. To defend themselves, the new entrants wrote text books and set exam questions based on them. Our group at the Middlesex Hospital Medical School (two psychologists, two sociologists) took the unusual step of collaborating on a book “The Experience of Illness” (1984) which brought together psychological and sociological perspective (to give you a flavour: “The interview is the one thing which distinguishes medicine from veterinary surgery”) and which in turn launched a dozen monographs. We won the initial battles, recruited students to our Intercalated BSc courses, marked exams, started research. And now, as the decades have passed, behavioural sciences courses have got shorter, the lecturers fewer, exam time far shorter and now vacant posts are often not replaced. The wave has passed.

Then drinks in the Haldane room and the inevitable halting ramble through city streets of a small gaggle of academics trying to find their way to an Indian restaurant which was just round the corner, somewhere.  In all, a very English celebration: low key, friendly and irreverent, and no evasion of differences amidst wry, amused reflection.

In the words of the 1968 Mary Hopkin song: “Those were the days, my friend, we thought they’d never end”, the days of hope and very earnest lectures which were going to change the face of medicine.

Digit Span: the modest little bombshell

Digit Span must be one of the simplest tests ever devised. The examiner says a short string of digits at the rate of one digit a second in a monotone voice, and then the examinee repeats them. The examiner then tries a string which is one digit longer, and continues in this fashion with longer and longer strings of digits until the examinee fails both trials at that particular length. That determines the number of digits forwards.

Then the examiner explains that he will say a string of digits and the examinee has to repeat them backwards, that is, in reverse order. For example, 3 – 7 is to be said back to the examiner as 7 –3. This continues until the examinee fails two trials at a particular length which determines the number of digits backwards.

I hope you will agree that this is a simple test, easy to understand, and largely bereft of any intellectual content. All you need is: to know the names of single digits, and to understand the simple instructions and examples given so that you repeat the digits forwards, and in the later version of the test, backwards. In particular, if you can do digits forwards you reveal you know your digits and have some memory, and if you can do a short string backwards you reveal that you have some memory and you understand the idea of repeating digits backwards.

The test is not only bereft of intellectual content, but is also low on cultural content. Once you have learnt digit names you are ready to do the test. I assume that forwards and backwards are concepts understood by all cultures worthy of the name.

Initially, test constructors regarded the test as an optional extra, because test-retest reliabilities were low. Arthur Jensen pointed out that this was simply because not enough trials were used. Once extra trials are provided, Digit Span becomes a good measure of general intelligence, correlating with g at 0.71.  Of course, Wechsler being Wechsler, they have also included some new tasks in Digit Span, in which digits are read to the examinee and have to be remembered back in order of magnitude, but we can leave that out for the time being, since it does not affect the central comparison between digits forwards and backwards.

How does digit backwards have this profound effect? Short term memory is just an auditory store. Most of the intellectual demand comes from digits backwards. That simple little task of remembering the forward sequence, and then keeping it in mind while reading off the sequence in reverse order taxes the mind. Digit backwards spans are usually at least a digit shorter than digits forwards. If someone can remember 7 digits forward (the average adult score) but only 6 backwards (the average adult core), that is a 14% reduction in memory capacity. (At age 11 for white kids the reduction is 23% and for black kids 30%, as shown below).  Digits forwards are related to g, but digits backwards are even more loaded on g.

How does this finding relate to the vexed question of group differences? Well, it is hard to give a plausible cultural explanation for the effect, unless you stretch the concept of culture to absurd lengths. Could there really be a culture in which there are numbers but no reversible operations? Even if there were a culture or putative sub-culture in which using numbers was discouraged, it should affect all digit tasks, not just digits backwards. (What name would one give to a culture in which number use is discouraged?)

If any group defined in genetic or cultural terms has a particular difficulty with digits backwards this is a strong indicator that they have difficulty with tasks as they get more intellectually demanding. The higher the g loading the more they should differ from brighter groups.

Hence the great interest in the most recent scores, to see if they conform to the usual pattern described by Jensen in the G factor (p. 405, referring to work he did in 1975 with Figueroa, ref on p 614). Over at Human Varieties, Dalliard has tried to replicate those results using data from CNLSY (these are the children of the female participants in NLSY79). Incidentally, this is a great follow-up survey: “My Mummy did your tests before I was born”. Gradually we are getting to understand the transmission of intelligence through the generations.

h-b-w ds results

The chart shows the increase in digit span with increasing age, and the nature of the gap between digits forwards and backwards in the different groups. This is clearer in the second table, which shows the gaps as Cohen’s d

CNLSY digit span racial:ethnic gaps

Incidentally, the fact that Hispanics have a slightly lower digit forwards score than whites and blacks but reasonable digits backwards slightly reduces their gap between the two conditions.

Dalliard says: “That the black-white gap on forward digits is substantially smaller than on backwards digits is a robust finding confirmed in this new analysis. This poses a challenge to the argument that racial differences in exposure to the kinds of information that are needed in cognitive tests cause the black-white test score gap. The informational demands of the digit span tests are minimal, as only the knowledge of numbers from 1 to 9 is required. Forward digits is a simple memory test assessing the ability to store information and immediately recall it. The informational demands of backwards digits are the same as those of forward digits, but the requirement that the digits be repeated in the reverse order means that it is not simply a memory test but one that also requires mental transformation or manipulation of the information presented.”

It is good to have a replication of a well-established and informative finding. However, Dalliard has pushed the analysis further, with a factorial study which suggests that black kids have a slight short term memory advantage which is enough to overcome the g demands of digits forwards, but not enough to cope with the higher g demands of digits backwards. This is a new finding which could lead to further studies.

Read the whole thing here

Finally, the really engaging feature of digit span from a psychometric point of view is that it is a true scale with a true zero. If you cannot remember any digits, your score is zero and that corresponds to zero digits. If you can remember 4 or 5 or 6 or 7 digits those are real scores, and the intervals between them are identical. So, for purists, this is an interval scale with a true zero like the Kelvin scale, where 0 Kelvin is absolute zero. Nothing is colder than that. Age in years is also a true scale.

At this point, it would be normal to explain what psychologist S S Stevens called it in his 1946 proposed typology in Science. Why on earth should I do that? You already understand the notion of a true scale with a true zero, where the intervals are truly each as big as each other. What more do you need to know? If someone says that IQ isn’t a real measure because “a quotient is all relative” please tell them a thing or two about digit span.

Ratio. I didn’t want you to waste time looking it up.

Become an instant expert on intelligence


Readers will know that I sometimes toy with the idea of writing a “Boost your IQ” book, which will also have an associated training course, expensive test materials, and very possibly lengthy seminars in international beach resorts. Trouble is, one would have to write the damn thing.

Then, whilst going through Linda Gottfredson’s website on another matter, I remembered she had written a very good “Instant Expert” piece in 2011 for the New Scientist, which covers all the main findings: the different types of intelligence; what intelligence tests measure; what is intelligence; quantifying intelligence; age effects; brain localisation; what makes someone smart; nature and nurture; realising your assets; simplifying your world; boosting brainpower (YES); cognitive enhancement; and are we getting smarter.

This publication is guaranteed to boost your intelligence, so long as you accept that increasing your knowledge might count as boosting your crystallized intelligence. What is more, it is freely available on her website. I understand that, on the basis of effort justification, you would like to pay a large sum of money and do N-back training for 20 hours, but why not take the intelligent short cut, and spend 20 minutes reading it, and then send it to a colleague who claims not to understand the concept?

Funny looking kids and the normal distribution


Much shorter version:

Say, through the mischance of illness, birth injury or genetic disorder, 1% of all children are born with something which damages body and mind, such that they look odd in some way, and fall below IQ 70.

Say, through the normal variation of genetic inheritance some children with normal undamaged bodies and minds also fall below IQ 70.

Then in the case of white kids the proportions will be 1% funny looking and 2% normally backward, so 1 in 3 looking odd.

In the case of black kids the proportions will be 1% funny looking and 16% normally backward, so 1 in 16 looking odd.

Black kids below IQ 70 will usually be normal looking, white kids less frequently so.

What does IQ 70 mean for black and white kids?

Arthur Jensen was always a paid up member of the select club of psychologists who actually give intelligence tests, as well as writing about intelligence. He was an educational psychologist who believed that every child could learn, and wanted them to have a cafeteria of learning choices, not an inflexible school set meal. He was fascinated by human intelligence, and bewildered by the vituperation of those who were not, and who shouted down his observations. He was thorough in his work, honest in giving his opinions, and steadfast in letting the results have pride of place. He was also very bright. Like many bright people, he assumed that you were probably bright, which was kind of him.

His first observation was admit that when he tested some children he came out of the assessment session convinced that they were bright. Then he would add up the scores and find that their results were pretty modest. Intrigued by his mistake, he worked out that they had been socially skilled, had presented themselves well, and had fooled him with their charm. Typically, he went on to say that he did not doubt that they would do well in life, because aspects of character other than intelligence can have an influence on success. Equally, he did not cover up his mistake in estimating intelligence, but reported it with interest. He was certainly not a person to boast “I can tell a person’s IQ at a glance”. Early on he noted that presentation during testing was not always an accurate guide to actual mental ability.

Then he took his observations further. He noted that when he tested Black kids of say, IQ 70 they came across as normal and their behaviour in the playground appeared normal. That is, they related pretty well, had the sorts of interests that other children had, and seemed to be street wise. Seemed to be. They weren’t always able to explain the rules and scoring systems of the games they played, so it depended on how closely you examined their understanding. White kids at IQ 70 were often slightly odd. They were sometimes funny looking in their appearance, and more difficult to make a relationship with. They were often somewhat naive.

One explanation for the difference is that intelligence tests do not accurately measure black children’s intelligence. Jensen wrote a book on this topic in 1980 “Bias in Mental Testing” so you can look up the data and his argument in that text. In a nutshell, he found that the tests did not underestimate black intelligence. In fact, they very slightly over-estimated it. He was also doubtful that there were such separate things as black  intelligence or white intelligence. In his view there was human intelligence, and the results showed that both people and groups differed in how much they had. (That is something of a simplification, because he also showed there were some differences, not least in the distribution of full scale scores, with black respondents having a more slender distribution, white respondents a more broad and “normal” distribution).

What other explanations are possible, other than test bias? Jensen argued that black children of IQ 70 were normal. It was not an illusion. Assuming a black IQ mean of 85, IQ 70 is but one standard deviation below the black mean. Nothing special, and not specially bad from that population point of view. A full 16% of black kids in the US are below one standard deviation for the black population. They were normal black kids. In fact, in terms of normality alone, though not in terms of ability, they were rather like white kids of IQ 85 who are one standard deviation below the white IQ mean of 100. Nothing special. Normal white kids. A full 16% of white kids are below one standard deviation of the white intelligence mean. Only about 2% of white kids are below two standard deviations, IQ 70, whereas 16% of black kids are (or were, see below).

In summary, IQ 70 is minus 2 sigma for whites but only minus 1 sigma for blacks. Being below IQ 70 is rare for white kids (2%) and pretty common for black kids (16%). Jensen pointed out that there were two routes to mental retardation: 1) simply being at the lower part of the intelligence distribution; and 2) having something wrong with your brain. So, some white kids are retarded because of some injury or illness or genetic disorder, which also makes them “funny looking”, plus some are just naturally dull. A larger proportion of  black kids are naturally dull,  and some (proportionately fewer) have had an injury or illness or genetic disorder.

Hence, the difference in apparent normality is real, and is explained by a careful understanding of normal variation in each population.

Can we test this explanation? Apart from checking all the facts (which appear to be correct) we have a new development in the last decade or two. It seems that average black IQ in the US is now about 90. If that is so, fewer black children will be 2 standard deviations below the mean. IQ 70 will be 20 points below the mean for them, rather than 15 points below the mean, so probably only 9% of black children will fall below that particular cut-off point.

As a consequence the proportion of funny looking black kids in the under IQ70 range should have gone up a bit in the last two decades, not because black kids are getting genetic disorders, but because there will be somewhat fewer normal looking backward black children. There will still be proportionately more funny looking white kids than black kids below IQ 70, but the different rates will not be so striking as before.

By the way, people with IQ 70 can do lots of things. Humans are spectacularly intelligent even at 2 sigma below the Greenwich Population Mean.  A great deal can be achieved, even in a group who, compared to everyone else, are considered to be at high risk.

Finally, the observation that a child can have difficulties either because they are naturally dull or because they have experienced some adventitious insult to their otherwise normal abilities, is not new. Here is a researcher estimating the number of backward children in a population and expressing himself in forthright language:

“We have seen [] that there are 400 idiots and imbeciles, to every million of persons living in this country; but that 30 per cent of their number, appear to-be light cases, to whom the name of idiot is inappropriate. There will remain 280 true idiots and imbeciles, to every million of our population. [] No doubt a certain proportion of them are idiotic owing to some fortuitous cause, which may interfere with the working of a naturally good brain, much as a bit of dirt may cause a first-rate
chronometer to keep worse time than an ordinary watch. But I presume,
from the usual smallness of head and absence of disease among these persons, that the proportion of accidental idiots cannot be very large”.

Who will be the first to provide the name of the author, the title of the work, and the page number?