Tuesday, 11 February 2014

The many-headed Hydra of alternate intelligences


Some stories never die. They serve a purpose: to distract, explain away, assuage a fear or, in this particular case, to make us feel better about ourselves. It is a variant of the seductive story that the examiners did not mark your papers correctly, and that other examiners would have rated you more highly. This is always true, to some extent, because we can all shop around for an assessment which gives us more flattering results, like choosing the best photo and discarding the unfavourable depictions.

Last March I posted an item about “tests of rationality” being championed in a science magazine, which tried to generate interest by talking about “popular stupidity”. http://drjamesthompson.blogspot.co.uk/2013/03/popular-stupidy.html

Now in “The Psychologist”, a magazine published by The British Psychological Society, Keith Stanovich and Richard West have written an article “What intelligence tests miss” suggesting that intelligence tests neglect to measure “rationality”. They are trying to create a test of rationality using Kahneman and Tversky’s problems, together with others collected by the late lamented Robyn Dawes and subsequently brilliantly dissected by Gerd Gigerenzer. This latest escapade strikes me as the recycling of Gardner’s Multiple Intelligences, in the form of: Alternative Intelligences (Seriously and Rationally).

The hidden implication is that if you are smarting at a disappointing result on an intelligence test you might be better off taking a rationality test, which could give you a more accurate, or at the very least broader, assessment of your wide ranging mental skills, not to say your fundamental wisdom.

IQ has gained a bad reputation. In marketing terms it is a toxic brand: it immediately turns off half the population, who are brutally told that they are below average. That is a bad policy if you trying to win friends and influence people. There are several attacks on intelligence testing, but the frontal attack is that the tests are no good and best ignored, while the flanking attack is that the tests are too narrow, and leave out too much of the full panoply of human abilities.

The latter attack is always true, to some extent, because a one hour test cannot be expected to generate the complete picture which could be obtained over a week of testing on the full range of mental tasks.  However, the surprising finding is that, hour for hour, intelligence testing is extraordinarily effective at predicting human futures, more so than any other assessment available so far. This is not entirely surprising when one realises that psychologists tried out at least 23 different mental tasks in the 1920s (including many we would find quaint today) and came to the conclusion that each additional test produced rapidly diminishing returns, such that 10 sub-tests were a reasonable cut-off point for an accurate measure of ability, and a key 4 sub-tests suffice for a reasonable estimate.

So, when a purveyor of an alternative intelligence test makes claims for their new assessment, they have something of a mountain to climb. After a century of development, intelligence testers have an armoury of approaches, methods and material they can bring to bear on the evaluation of abilities. New tests have to show that they can offer something over and above TAU (Testing As Usual). Years ago, this looked like being easy. There is still so much unexplained variance in ability that there was great confidence in the 60s that personality testing would add considerable explanatory power. Not so. Then tests of creativity were touted as the obvious route to a better understanding of ability. Not so. Then multiple intelligences, which psychology text books enthusiastically continue touting despite the paucity of supportive evidence. Not so. Then learning styles. Not so. More recently, emotional intelligence, produced partial results, but far less than anticipated. Same story for Sternberg’s practical intelligence. The list will continue, like types of diets. The Hydra of alternative, more sympathetic, more attuned to your special abilities, sparkling new tests keeps raising its many heads.

What all these innovators have to face is that about 50% of all mental skills can be accounted for by a common latent factor. This shows up again and again. For once psychology has found something which replicates!

The other hurdle is that nowadays there are very demanding legal requirements placed upon any test of intelligence. You have to have a proper representative sample of the nation, or nations, in which you wish to give the test. Nationally drawn up samples of 2000 to 2500 are required. Not only that, but you generally have to double sample minorities. You also have to show that the items are not biased against any group. This is difficult, because any large difference between the sexes or races is considered prima facie evidence of bias. Indeed, if there are pronounced, very specific differences between the mental abilities of the sexes or of racial groups, such findings have been discarded for the last 50 years, at least as far as intelligence testing is concerned.

The conceit of the new proposal is that rationality is a different mental attribute to problem solving in the broad sense. The argument is that IQ and results are poorly correlated (.20 to .35) in university students. To my eye, given the restriction of range (even at American universities which take in a broad range of intellects in the first year) this is not a bad finding. I say this because the authors do not yet have a rationality test. They seem to be correlating scores on a many-item IQ test with the scores on a few pass-fail rationality problems. This lumpiness in the rationality measure needs to be sorted out before we can say that the two concepts are independent.

In fact, when you read their 2009 paper it turns out that they did not give their subjects intelligence tests. They simply recorded what the students told them were their Scholastic Ability Test totals. I don’t wish to be too hard, since of course scholastic ability tests are largely determined by intelligence, but since the authors go on to talk about “what intelligence tests miss” I think they ought to say “what self-reported scholastic achievement tests score miss”. In fact, even that is wrong, because the word “miss” implies a fault in the original aim. So, what they should have called their later book is “some tasks don’t correlate very strongly with what university students self-report about their scholastic achievement tests scores”. As you will note, I am in favour of catchy titles.

That aside, the authors note that if the “rationality” task allows you to guide your choices by doing a calculation (deciding which of two trays of marbles has the highest probability of producing a black marble which gets a reward) then the correct choice is made by brighter students (SAT scores of 1174 versus 1137). This test provides only a pass/fail result, like so many of these “rationality” puzzles, so does not easily fit into psychometric analysis.

By now, dear readers, you will have worked out the main difference between intelligence test items and rationality puzzles. The former are worked upon again and again so that they are as straightforward and unambiguous as possible. If a putative intelligence item is misleading in any way it gets dropped. Misleading items introduce error variance and obscure the underlying results. Also, if particular groups are more likely to be mislead, then their lawyers can argue that the item is unfair to them. All those contested items do not make it to the final published test.

Rationality puzzles, on the other hand, can be as tricky as possible. They are not “upon oath”. If a particular symbol or word misleads, so much the better. If the construction draws the reader down the wrong path, or sets up an incorrect focus of attention, that is all part of the fun. Gigerenzer did some of the best work on this. He looked at the base rate problem beloved of previous investigators, and at all the difficulties caused by percentages with decimal points and all the rest of it, and then proposed a solution (this is unusual for psychologists). He tested his proposed solution (which was to show the problem in terms of natural frequencies, usually on a base of 1000 persons) and found that it got rid of virtually all of the “irrationality” problem. Much of the “irrationality” effect is due to the problem form not being unpackaged properly. This is not a trivial matter, but it is not an insuperable one. For example, consider the question which Stanovich and West give as an example of irrationality.

A bat and a ball cost $1.10 in total. The bat costs $1 more than the ball. How much does the ball cost?

Most people say 1o cents. This makes sense, because this is the usual way you calculate, in that if you spend $1.10 on a bat and a ball, and the bat costs $1.00 then the ball costs .10. It is unusual and somewhat bizarre to put in the concept “a certain amount more than another amount. The usual answer of 0.10 would be right in most circumstances. This is a special circumstance, and very unusual, in that the concept of “$1 more than” is being used in what appears to be a simple calculation. Respondents use the usual format, without noticing the subtle format change. This change means that you have to work out a sum for the bat and ball, so that when you take the cost of the ball from the bat you are left with exactly $1. It cannot be 10 cents, because if you take 10c from $1 you are left with 90c. So, in this case the ball must cost 5c so that when you take 5c from $1.05 you are left with exactly $1. It may strike you as a bit odd, and somewhat tricky and pedantic, and you would not be wrong in making this judgment.

In this particular case the question might be recast as follows.

A bat and a ball cost $1.10 in total. The bat costs $1 more than the ball, meaning that when you take the cost of the ball away from the cost of the bat you are left with exactly $1. How much does the ball cost?

Even the extra explanation might not do the trick, because the usual subtraction sum is uppermost in people’s mind, but they are not being irrational when they make the mistake. They fall for a trick, but they can learn the trick if they have to, or if it seems likely to be useful in the future. In my view the real world implications of this finding are almost zero, other than to highly how some subtleties and ambiguities lead us astray (and are best avoided in standard examinations). As a sideline, if an aircraft cockpit contains similar ambiguities, they can be lethal, and must be removed for safety reasons.

Similarly, as already discussed Dawes base rate problem disappears when you use natural frequencies. Gigerenzer likened it to being confused about the colour of a car seen under sodium floodlights at night in a car park . In the day time the usual colours were visible again. Strange problem formats (mathematical notation, symbolic logic notation, percentages which include decimal points, decimal points with many zeros, relative versus absolute risks, complicated visual displays in aeroplane cockpits, poorly set out controls in cars) impose an additional load on understanding. Most respondents take a short cut. As a rule of thumb, if you need lots of special training to operate a system, it is badly designed for humans.

The Stanovich and West test of rationality has yet to be constructed, let alone tested on the general population. To show that the test was worth giving it would be necessary to measure what additional benefits it provides over and above Testing As Usual. If the resultant Rationality Quotient proves to be very powerful in predicting human futures, then it can take over the lead position from intelligence testing. What is interesting to me is how much mileage they are getting out of attacking intelligence testing for “what it misses”. All they have done is compared SAT scores with replications of some rationality tests. Described more modestly, I would be on their side, and interested in the results of their replications. They distinguish between the results on different tests, which provides a version of an item analysis. However, they do not show that some tests are better predictors of real life achievements than the SAT scores reported by their university students. And, once again, university students are not the only people in the world, nor are they representative of the mental abilities of the general population. Stanovich and West’s rationality test seems to be a case of premature self-congratulation.

What can one say about a test which has yet to be created, tested,  published and compared with established measures of mental ability? Frankly, it would be premature to say anything except: Good luck.


  1. Ha just before reading this post, I had come across press release about "another paradigm-shifting idea", this one known as "personal intelligence", which is apparently an "invaluable new framework". Of course, the " internationally recognized researcher" behind this new theory has a book to sell, which I assume he hopes to parlay into lucrative consulting gigs in the education industry.
    And to cynical me, that's the main motivation behind these alternative theories, behind the Golemans and the Sternbergs etc,; not so much an intellectual hostility to IQ, but rather a chance to hit the jackpot.

  2. Had I the time, and the inclination, I would join them: "How to boost your IQ and lose weight at the same time"

  3. excellent post -- educators (aka non-empiricists/g-deniers) love "multiple intelligence" & "emotional IQ." in the last decent IQ documentary (from the late 1960s with a young Dan Rather!) Jerome Kagan says coyly, "AbilitY... or AbilitIES?" knowing full well it's the former, but more money's to be made in the latter (the people want hope - & magic!)

    "executive functioning" is halfway between the sham of emotional IQ/multiple intelligences & the reality of g - it's a grab bag of various high & low g-loaded tasks (until i'm convinced otherwise!) But, a few TBI types score high on IQ yet can't function in the real world - "executive functioning" attempts to explain why - so it's a bit better (& more measurable!) than the execrable emotional IQ/multiple intelligences.

    Rationality puzzles will be too wordy, require too much crystallized g, & will not be psychometrically sound (one won't get equal item gradients between difficulty levels of items - the items are more likely to be biased/predict differently for each group, etc.) they will be useful only to the extent they measure g.

    3 cheers for noticing the restriction of range issue! one hears, "the ACT/GRE doesn't predict achievement well" - oh yeah? try giving them to everybody (the whole normal curve) & let the whole normal curve into those colleges/grad programs. then the correlation will be pretty high!

    side note - in the last 30 years, the GRE went from being an ability test to being an achievement test - both predict grad school performance about the same, but by being more explicitly "reading comprehension & math achievement" - it catches less political flack for having (the ever present) group differences. Yet, ironically, now it's less likely to catch minority "diamonds in the rough."

    side note 2: Das & Naglieri marketed an anti-g test - which publishers bought b/c the authors have good names, but, like the K-ABC of old - group differences are smaller b/c it had less g, & more saliently: it was less accurate a predictor b/c it had less g.

    1. Thank god that someone else shares my views on "executive functioning"; I was beginning to think I was all alone! Damn near every single class this term has included some mention of this frustrating umbrella concept, with little critical scrutiny. But the tests are just rebranded WAIS subtests; there's little to no indication that they're measuring anything that RPM does not. After reading paper after paper where the construct is so uncritically accepted and measured again and again by tests obviously contaminated by g, with no indication that anyone has bothered to look at construct & discriminant validity - ugh.

      Actually, I tell a lie, because Timothy Salthouse did bother to check the discriminant validity - http://www.ncbi.nlm.nih.gov/pubmed/16060828?dopt=Abstract - and the results were not promising for executive function.

  4. Can you talk more about "executive functioning" and suggest some references on how it behaves viz a viz intelligence? It is all the rage in medico-legal circles here, because it gives you lots more chances of finding apparent deficits.

  5. I was hoping YOU'd talk about it:) You made a very cogent point about it being simply random things to measure which may be culpable for/capable of explaining - whatever's going on with a person (& for further justifying psychologists existence:)

    I've loathed the concept or "executive functioning" ever since first reading it in muriel lezak's neuropsych text book back in 1987/88 in my doc program (while at the same time reading a real scientist's book - Arthur Jensen's Bias in Mental Testing:)

    Executive functioning has caught on big time & become all the rage among easily duped psychologists & educators - but it has always remained a hodge-podge of scattered skills of varying degrees of g.

    its only validity (that i can see) is as a weak theory as to why some people with TBI do well on IQ tests, but fail to function in society.

    it's a random collection of "managerial skills" such as organizing, planning (often ridiculously measured thru mazes subtests! ahem - that's NOT REAL planning), multi-tasking, evaluating ideas, being patient, etc. all attributed (way too) exclusively to the frontal lobe/prefrontal cortex.

    I agree with your theory that it gives us more deficits to find in our bag of tricks:)

    it's a random grab bag of variables (of varying degrees of "measure-ability") endorsed by people who aren't bright enough (or statistically savvy enough) to understand "g." executive functioning certainly doesn't fit in well with g "theory" /(real psychometrics).

    however, perhaps i am just low on executive functioning & that's my beef with it:)

  6. You have hit one of the park on this one, Dr.

    As for "executive function", it sounds to me like taking IQ X lack of ADHD traits.

  7. I have ordered Stanovich's book. I'll review it when it arrives and I've finished it.

    1. i rolled my eyes so much reading that book i finally stopped reading, lest i develop permanently rolled eyes :) as is his way, he offers many non-occam's razor alternative (PC) hypotheses to the obvious un PC occam's razored monster in the closet. but, at least he doesn't ignore these things. he tends to come down on both sides of the fence (one side for this, the other side for that). he writes well, tho - & he's a hero for naming "the matthew effect."

      BUT, Elijah, your recent article shows why the Flynn Effect should be renamed the Armstrong-Woodley Effect. & why the idea of secular increase in (actual) intelligence should be thrown on the scrap heap of history.

      meanwhile, back at the test publishers, thru the 1990s, we could bank on 1/3rd of a point rise a year. But now, as 1) demographics change (more low scorers) & 2) we reach saturation limits on rule-learning/test-wiseness points - that increase has leveled off (e.g., DAS-II 2007's renorming only changed the GCA 2.9 points or so - they buried it deep in the manual, so people wouldn't say, "hey, why'd we have to buy a whole new test if the norms didn't change that much?" & also so people wouldn't say "hey, why didn't the norms change very much?") i wish Elijah had been born earlier so he could've cleared up that confusion back in the 1990s for us:)

      i do like how the field had to be schooled by a very smart THINKING teen. & how it unthinkingly accepted a theory that would've made all our great grandparents retarded or intellectually disabled (actually, idiots, morons, & imbeciles, to use the era-appropriate terms:) sorry, grandpa!

    2. i am in error - stanovich named the matthew effect. what was i thinking? still, Flynn's a hero for noticing & dealing with the "secular increase" in IQ scores + it was valuable & accurate info to set your clocks by in the test norming industry. if we actually had been getting smarter we would've been smart enough to figure out why it was happening. fortunately, Armstrong & Woodley came along.

  8. I believe you explain calls multiple intelligences , as well , because humans ( still ) are not robots . Personality influences deeply on how a person will interact in society and in their own building society . The problem of the tests is that they , again , are mathematical results in something that is complex although it is measurable .( X-man with iq 120 is smarter than Y-man with iq 110??)
    Human beings are not parts , but a whole . Not matter much if you have a person with a super high IQ and an unfavorable personality to work it . Do not you just continue to select for high iq in China , if the result is an emotionally cold and materialistic population.
    Everything that the human being has produced the highest level of their ability comes from their creativity . I've seen several test iq the internet and I'm sorry to tell you but they do not measure creativity , nothing is able to measure what is completely new or unexpected. There is no way to measure the future and creativity is always the future .
    Yes , iq tests are the best means to measure raw intelligence in humans , are excellent for measuring anything based in large groups . Are still very good to measure individual intelligence .
    But are useless for outliers . Because outliers are the future , they are creativity itself. The only deliberate way to measure these types would obtain funds by means of the g factor , which keeps them balanced between extreme madness and extreme rationality tests .


  9. Dr. Thompson I had one direct observation that might help you look into some things, don't want to waste your time even though there is a lot to discuss on the subject. Since you specifically called it out as an example, that one test problem is from what is called the Cognitive Reflection Test.

    That set of items (three quick questions) actually been used in a lot of studies, made for many papers you'll see with that name and many correlations, to typical things like student grades, have been well documented. Having not investigated the rest of the piece I'm not sure what else is novel and untested. I personally agree that a lot of bad, unreliable psychometric testing is due to a single general effect that a subset of laypeople are going to be confused and manipulated by weird framing regardless of what is actually being studied, but that's not the sole intent of the specific test in mathematical/logical reflection.

    As for Stanovich I've criticized his work before in part because he is a poster child for the WEIRD issue, giving a lot of research a bad name without even making an effort compared to fields like evolutionary psychology. When methodology is even available in any given paper I've seen from him, besides the issue of self-reporting by subjects which you too noted, one finds worrisomely samples with small n's, skews sometimes like 80% female, within narrow age ranges, and volunteers being not even good university students but ones who are decidely in the bottom half of the curve. WEIRD however is a methodological criticism and Stanovich at least appears to be a hardworking, honest researcher. His research on reading/childhood learning is probably better than throwing his hat into the ring of vague, sometimes unreplicable priming and framing experiments.

  10. While I have published many scientific papers, I have only recently started to work with data that includes measures of IQ. Something happens to you when you work with IQ data. First, when you actually see the psychometric properties of IQ tests you realize that the tests are the products of serious and prolonged scholarly investigation. Few tests have psychometrics as neat as do IQ tests. Second, you see how well IQ tests predict status later in life. Finally, the data on group differences are so consistent that they cannot be produced by chance, by testing effect, or by any other known mechanism. IQ data over time, place, and culture reveal the same patterns. Not bad for something social constructionists say doesn't exist.

    1. Glad you find the measure useful, and appreciate its power.

  11. I read Kahneman's much celebrated book but it turned out to be a lot of magician's tricks and even optical illusions like you saw in Ripley's Believe It or Not comic strips many decades ago.

  12. The bat and ball question can be solved in half a minute using simple algebra.

    X + (X + 100) = 110
    2X + 100 = 110
    2X = 10
    X = 5


  13. Thank you! However, I take issue with your use of the words "simple" and "mechanical". The problem requires people to understand the special meaning of "more than" in an unfamiliar context, and then know that another approach must be used, and then know that algebra will help, and then know and remember their algebra. Mechanical to you, Frau Katze, but not to me, and not, I imagine, to most other citizens. However, we are agreed that this is very much a test of intelligence, and not a separate test of "rationality". Thanks for your contribution.

  14. I would agree it's not a test of "rationality." That's just a new buzzword, I guess!

  15. Here's a point that I got from your post that I think it didn't state as clearly as it might have:

    "Rationality" questions might well measure g better than standard questions because the standard questions are constrained by fear of discrimination lawsuits and so can't seem tricky or unfair. If true, however, an academic test designed on that basis would be illegal under current law, so we shouldn't pooh-pooh the current tests; they're the best the government will allow.