Thursday, October 13, 2011

A Critical Look At The Level and Nature of Autistic Intelligence

As I mentioned a few days ago, Michelle Dawson et al published yet another paper on "the level and nature of autistic intelligence".  I didn't go into any details about the paper at that point - even though I had read it and had started writing this post - because it had disappeared from the journal's website and I wasn't sure what it was going to be in when it came back.  Well, the paper is back and since Ms. Dawson was kind enough to call me a "renowned blogger" I thought the least I could do was give the paper a serious look.

I am actually going to be talking about two papers by Dawson et al - "The Level and Nature of Autistic Intelligence" and "The Level and Nature of Autistic Intelligence II" - because they use some of the same underlying data and they talk about the same idea.  The text of both papers is freely available, here and here, so if you are really interested in the subject I suggest that you read them for yourself.

With that said, the basic premise in both papers is to that people with autism are actually more intelligent than is commonly thought.  Conventional wisdom (and science) holds that people with autism are often intellectually disabled and, even when they aren't, have intellectual challenges that places them at a disadvantage to a "typical" person.  These papers try to show that people with autism have a different way of thinking and that it isn't so much that they lack intelligence but rather it is the tests that are used to measure their intelligence that are lacking.

Or, in the words of the of the second paper, "autistic spectrum intelligence is atypical, but also genuine, general, and underestimated".

As I said before, in some ways I completely agree with that statement.  Conventional intelligence tests rely on certain abilities, such as the ability to understand verbal communication and a ready understanding of the environment, and are very challenging for people with autism.  A person with autism might very well score lower than a typical person because they have problems with certain core skills, have problems focusing, or have sensitivities to the immediate environment, not because they lack intelligence.

But that is the nature of the disorder called autism - it disrupts a person's ability to function in a "typical" manner.  It doesn't necessarily mean that they lack intelligence but it makes the application of that intelligence difficult.

Getting back to the papers, in both papers the authors gave two different intelligence tests (not really, but more on this in a minute) to several groups of children and adults who either were "typical", had autism, or had Asperger's.  The first paper focused on children and adults with autism while the second focused on children and adults with Asperger's.  In each paper, there were four groups - typical children, typical adults, children with autism/Asperger's, and adults with autism/Asperger's.

Some of the "typical" children and adults were in both papers although it is never spelled out exactly how many were in both or whether they were retested for the second paper.  That last bit is important because the way the IQ tests were administered differed between the papers.  So if the data from the first paper was just reused in the second then it might have skewed the results of the second.

All of the groups were given two different styles of intelligence tests, the Wechsler Intelligence Scales III and Raven's Progressive Matrices.  The children were given the Wechsler Intelligence Scale for Children (WISC-III) and the adults were given Wechsler Adult Intelligence Scale (WAIS-III).  All of the participants were given the standard form of Raven's Progressive Matrices.  There are two other forms of the Raven's test, including one that is meant for younger children or children with learning disabilities.

There are two main differences between how the tests were administered between the papers.

In the first paper, the Raven's test was given to all participants with no time limit whereas in the second paper, the standard time limit (40 minutes, I think) was applied.  I think the impact on the scores from that difference would be obvious.

The second difference is that the tests in the first paper were scaled according to North American norms while in the second paper Canadian norms were used.  This is a little bit obscure, so let me explain.

The basic idea with modern intelligence tests is to give a bunch of questions and then score the number of correct answers.  But since this raw score does not really tell you anything meaningful, these scores need to be translated into some more useful form, such as an IQ score or a percentile.  To do that, the test scores are "normalized" by giving the test to a large number of people and then using the resulting scores to establish what the typical score is and what range of scores would be expected to be.  The typical score is set to be an IQ of 100 (50th percentile) and 1 standard deviation is set equal to 15 IQ points.

This picture from wikipedia might make the idea clearer -


So the problem is that the two papers used two different sets of translations from the raw test scores - North American norms and Canadian norms - and that there are differences between the mappings.  So you cannot directly compare the final results without first recalculating the result using the proper normal ranges.

So was the "typical" data that was reused in the second paper re-normalized with the Canadian norms or was it kept under the North American norms?  And were the comparison charts from the first paper that were included in the second paper (i.e. Figure 1) adjusted as well?  After reading both papers several times, I still can't say one way or the other for certain.

But let's set that aside for now and consider the intelligence tests that were used - Raven's Progressive Matrices and Wechsler Intelligence Scales for children and adults.

The Raven's test is a an old and somewhat simple test that presents a series of progressively more difficult visual puzzles.  The visual puzzles take the form of shape that has a piece missing and a set of possible answers.  This site has an example of what one of the questions might look like.  The person taking the test has a fixed amount of time to answer as many of these puzzles as possible.

The Raven's tests were initially based on the idea that intelligence was a single, unified general ability.  Under this model, you either had "intelligence" or you did not.  But like all primitive models, this idea of a single unified intelligence has been gradually replaced by the idea that there are many different types of intelligence and that a person is going to have a varying level of intelligence depending on the exactly what part of their intelligence you are measuring.

Which is where the Wechsler tests come into play.  These tests attempt to measure the different types of intelligence by the use of different sub tests, each with a specific focus.  Under this this newer model of intelligence, Raven's test is no longer measure thought to measure "intelligence" but rather one subtype of intelligence called fluid intelligence.  Fluid intelligence is the ability to think logically and solve problems in novel situations, independent of acquired knowledge

So on one hand you have the Raven's test that is measuring the ability to think logically and solve problems and on the other you have the Wechsler tests that are trying to measure actual abilities and the ability to apply what you know to a given situation.

I don't want to go into and more details about the differences between the tests because that would take a long time and I am nowhere close to an expert (or even that knowledgeable) on the subject.  If you are interested in the differences between the tests or the history and current theories of intelligence, I suggest starting with the Wikipedia entry on the subject and working your way outward from there.

But let me just say that if you have spent any time with children who have even moderate autism, you would know that the differences between these two tests highlight one of the core challenges of autism.  That being  while it can be challenging to teach a child with autism it is equally, if not more, challenging to get the child to apply what they know to a given situation.  There is a very large gap between being able to learn, actually learning, and being able to generalize that knowledge.

But getting back to the papers, the core data point from both paper is that, while the Wechsler test shows a fragmented and uneven profile of intelligence in people with autism, the Raven's test often shows a significantly higher level of intelligence than the Wechsler in the same group.  Furthermore, this significant difference is not present in "typical" children and adults.

So the authors concluded that, since the Raven test is thought to measure a more general form of intelligence, the difference between the two tests represents a problem with how the Wechsler tests measure intelligence with respect to people with autism.  They concluded that the Raven test is a more accurate measure of true "atypical" autistic intelligence.

As I said, I agree with this idea up to a point.  But (you knew that was coming), there are quite a few problems not only the idea in general but also with the data in both papers.

As I alluded to above, this interpretation ignores the fact that people with autism (and children in particular) have a hard time with the generalization of knowledge.  It is one thing for them to know something when you are teaching it to them and asking highly structured questions, it is quite another for them to be able to take that knowledge or reasoning and apply it in a novel situation.

Another problem is that this interpretation ignores the widely accepted idea that people (and again children in particular) with autism have what are called splinter skills.  Splinter skills are what happens when a person has uneven development of skills and are substantially behind in some areas, ahead in others, and at the appropriate level for the rest.  So instead of a person having a fairly even level of skills, they would be have an extremely uneven level of skills.  For example, some children with autism will learn to read before they develop receptive or expressive verbal skills.

You can see evidence of splinter skills in the results from the Wechsler test.  You can also see it very clearly if you give a child on the spectrum a developmental test such as the Battelle.  So the data from the Wechsler and Raven's tests could easily be yet another example of splinter skills.

In my opinion, if you combine these two ideas, you could say that one of the core traits of autism is an uneven level of skill and difficulties in applying those skills.  The other core traits are an extreme difficulty in teaching skills in the first place (at least in some people) and the behaviors of autism.

But let's set all of the above aside. Let's assume that all of the data is in the proper terms and let's assume that the difference between the test values can't be explained by known properties of autism.

The next question is whether what the two tests measure is an equally valid view of intelligence or whether the tests measure different things.  Can we really look at one repetitive test of intelligence and assume that it better represents potential intelligence better than another test?

I think the answer is obvious, each test provide a different view of a person's intelligence.  But to arrive at a true measure of a person's intelligence you have to consider all of the available evidence.

The next follow up question is whether the end results of the tests are directly comparable.  Does a final score of the 80th percentile on one of the test mean the same thing as an 80th percentile test on the other?  For this to be true, both tests would have to be an equivalent measure of a person's intelligence, i.e. they would both have to measure the exact same thing.

I think it should be obvious by now that they don't, so I think that you would have to be careful in directly comparing the results between the two tests, doubly so if you wanted to do any calculations based on the numbers.

But again, lets set that aside for now and look at the actual data underlying the papers.  I normally don't like to criticize the presentation of paper directly, but if I had to describe the data in these papers I would call it sloppy and disorganized.  There are numerous inconsistencies in how the data is presented, a few blatant mistakes, and neither paper gives a clear view of what the data actually is.

Just to give you an idea of what I am talking about.

In the first paper, there is no table that summarizes the data, you have to piece the individual pieces together from the text.  There are figures that are presented without any real description of what the data is, such as Figure 1 that says it presents "mean subtest scores" but then charts percentiles.  I have to wonder what the percentiles are of, correct answers or normalized results.  And the data in figure 1 is presented only for one of the four groups in the paper which begs the question what the other groups look like.

In the second paper, there is a table (Table 1) that presents some of the data.  But then that data is contradicted by the first figure in the results section and that figure is central to the results being presented.  You would think that someone would have checked that.  Later in the paper you are directed to non-existent figures.  And again, you never are presented a clear view of the data that is being discussed.  Some of the data is contained in the table while other parts of it are presented only in the text and then you only get to see one small part of the data.  And then there is another strange chart, Figure 2, that presents data that is similar to the Figure 1 in the first paper but instead of means or percentiles presents scaled scores.  And, again, the data for the other groups in the paper are left off.

I could put together a better presentation of the data and that is really saying something.  But after sending several quality hours going over the papers and trying to put all of the pieces together, I have some concerns about how the data was actually analyzed.

The main result in both papers was that the percentile difference between the Wechsler and Raven's tests was significantly larger in most of the autism/Asperger's group than it was in the "typical" groups.  Most of the groups (with the exception of the Asperger children) did better on the Raven's test than they did on the Wechsler.  But the the Asperger adults and both the autism groups showed a significantly larger improvement than the others.

Which leads me to my main problem with the data - how the difference was calculated.  To put the problem simply, you cannot accurately compare the difference between two percentiles and get a meaningful result because percentiles themselves are not linear.  I think I can illustrate this best with an example.

If I have two numbers - 5 and 1 - that represent the differences between two sets of percentiles (50 and 55, 98 and 99), which one would you assume represents a greater change in intelligence?  The obvious answer is of course 5 - the change from the 50th percentile to the 55th percentile.

You would assume that a change of 5 percentiles always represents a greater change in intelligence than a change of 1 percentile.  But in this case you would be wrong, the increase from the 98th percentile to the 99th percentile represents a greater change in intelligence than the 50th percentile to the 55th percentile does.

You can see this if you change the percentiles into IQ points (see above).  The 50th percentile represents an IQ of 100, the 55th an IQ of 102, the 98th an IQ of 131, and the 99th an IQ of 134.  So the 5 percentile change equates to a change in IQ of 2 while the 1 percentile change represents a change in IQ of 3 IQ.

The reason for this discrepancy is that percentiles, at least as they are used in this paper, are meant to provide a relative ordering of everyone who takes a particular test.  So scoring in the 80th percentile means that you did better than 80% of the people who took the test and worse than 20%.  The percentiles do not tell you anything about the magnitude of the difference between the groups.

So, even if you had a set of percentages that are all from the same test, you could not subtract them and do anything meaningful with the results.  You cannot take a set of differences and order them from the smallest to the largest (which is required for the statistics used in the second paper) because you do not know which change in percentile represents a larger change.

The first paper's main conclusion is in doubt because the statistics not only assume the ability to order the results, but also assume a linear scale and a normal distribution of the data.  Even a quick look at the statistics shows that the distribution cannot be normal (e.x. range 0 to 100, mean 36, SD 26) and the differences aren't ordinal let alone linear.

The second paper at least used statistics that did not depend on a normal distribution.  But even still, the main statistics depends on the data being ordinal.

So when the second paper says this in the results section -

"The Asperger adults demonstrated an advantage of RPM over Wechsler FSIQ that was significantly greater than that of the non-Asperger adult controls, Mann-Whitney U=366.5, p<.01"

That statement is completely unsupported by the data.  In pure numerical terms, the difference might seem to be larger, but in terms of actual increased of intelligence that statement is very much in doubt.

Another quibble with the results is the use of averages (means) to represent the group rather than a median.  If you have a set of non-linear values such as these percentiles, if really isn't valid to take an average because it is going to misrepresent where the middle of the group is.  That goes double when the data is badly skewed, as is the case of the Asperger adults' Raven's test in the second paper.  In that case the "average" was 74 but the standard deviation is 50(!).  For that to happen, the bulk of the data has to be well below the 74th percentile which means the median value would be significantly lower.

Although, to be fair, there are some valid secondary results.  For example, when the paper reports that "the Asperger adults’ Wechsler VIQ was significantly higher than their PIQ (55th vs. 39th percentile), Z =3.43 p<.01", that could be valid because the data is in the same terms and the statistics were (apparently) used properly.  What it means without the main result though is an entirely different question.

Who knows, maybe I am missing something fundamental about the data here or am completely wrong about the percentile thing.  But from that I can see in the paper and what I know about statistics, it looks like the conclusions are based on a faulty analysis.  If anyone sees something obvious that I missed, please point it out in the comments.

I really could go on to point out quite a few other problems with the data such as the fact that the differences are percentiles aren't even based on the same test, or that the number of participants in the papers is rather small, or that confounding social/demographic factors weren't adjusted for.  But since the main result is likely invalid, I don't really see the point in beating a dead horse.

Whew.  Anyone still reading this?

Now that I have rambled about these two papers far longer than I had wanted, let me just say that while I think these two papers are mostly worthless, the idea that people with autism can be intelligent isn't.  There is nothing implicit in autism that says that everybody who has autism is automatically intellectually disabled, although there appears to be a large group that is.

What I think is obvious is that autism disrupts a person's ability to apply their intelligence.  Even if you throw out every problem that I pointed out with these papers and took their data at face value, the data would fully support that idea that there is a break between what a person can do and what autism allows them to do.


References

1. Dawson M, Soulières I, Gernsbacher MA, Mottron L. The level and nature of autistic intelligence. Psychol Sci. 2007 Aug;18(8):657-62. PubMed PMID: 17680932.

2. Soulières I, Dawson M, Gernsbacher MA, Mottron L. The Level and Nature of Autistic Intelligence II: What about Asperger Syndrome? PLoS One. 2011;6(9):e25372. Epub 2011 Sep 28. PubMed PMID: 21991394.

13 comments:

  1. I was unaware that they used two different tests or that any of the Raven's were timed. Souleries' fMRI study, based on data from the first paper, the autistic group did the raven's questions faster than the normal controls. You missed the fact that the differences between the asperger's children and the controls were far less than the differences between the adults, and the data for applicability that children with autism could be theoretically taught in different ways than typical children is more relevant than in adults, since adults are no longer in school.

    Also, a problem with Dawson (2007) and the recycled controls that you failed to touch upon as that the controls were not typical (to use your word and maybe Dawson and Souleries' also) in any manner. They were a self-selected sample recruited from a newspaper ad who were in the 70th percentile on the Wechsler. 70th percentile is not typical of any group. There were also mostly or all males and I seem to recall there have been some studies showing sex differences in scores in the Raven's between men and women.

    Another problem was that the autistic group in the first paper was not typical, the sex ratios were 12:1 M:F whereas they are 4:1 in a typical sample of autistics so this was not a representative group of autistics, since females are generally lower functioning and are more likely to have intellectual impairments, so the autistic group in the first paper were likely higher functioning. Also in Dawson (2007) the adult group of autistics averaged in the 50th percentile on the Weschler, so they were certainly not an intellectually impaired group in that study. In Boelte(2009), differences were found in the RPM v. Weschler in the autistic group but they were far less pronounced than in Dawson (2007) and they were limited to the lower functioning group (IQ 85 or less) if I remember correctly. They had a 3:1 ratio of males to females, a much more representative sample of autistics. Since the higher scores on RPM were confined to the lower functioning group, it is unlikely they would have got anywhere near the same results as in Dawson (2007) so in that sense Boelte (2009) may be a nonreplication of Dawson (2007).

    If there are no sex differences in typical males v. females in scores on the RPM Dawson neglected to mention it in her first study to account for the very atypical control group. There was no documentation that persons scoring in the 70th percentile on the Raven's would behave in the same way as a group of typical controls who score in the 50th in comparisons of Wechsler v. RPM.

    I wish I knew enough about statistics to comment on the rest of your analysis.

    ReplyDelete
  2. An addendum to the first post, in the beginning I stated that in Souleries fMRI paper the autistics only did the scores faster, not better than the controls, so this would be irrelevant for drawing any inferences, but apparently some of the Raven's are timed tests if you are correct.

    Also, I should point out if there is any correlation with a high score on the Raven's to succeed academically, socially professionally or in any other life endeavor, Dawson neglected to mention this also.

    ReplyDelete
  3. So after attempting to read that, and getting lost somewhere in the middle... need more coffee :)... I am going to simply say...

    "You cannot prove the severely autistic are severely MR as more than one autism blogger tries to claim".

    Why??

    They aren't VERBAL.

    I have one of those "severe" children. We now have the starts of language. We now have a 200+ word flipbook to teach sentences. I am currently testing Gr 1 English curriculum and although we haven't hit the stories section... he's zipping through it.

    The ONLY person that cared about the IQ test in Gr 3 was the gov'ts paperwork to get us in this LD class we're in. NOBODY else did, including the psychometrist that did all the testing. Everyone else wanted to know what he knew and could do. Psychometrists recommendation... since he could read easily (thanks to his Mommy) was "teach him to communicate, you'll be surprised what he knows". That has been and will continue to be our #1 goal.

    One thing I am surprised at is that although he is becoming verbal, and is definately a lot smarter than anyone thought possible, the "autism" isn't lessening. Whereas with elder bro and his speech delay - not autistic speech delay - it has eased to where he "passes for normal" most of the time.

    ReplyDelete
  4. Hi Jonathan,

    You are quite right about there being a number of other problems with the data in both papers. There were so many things off in both papers that it was hard to pick which problems to point out. So I went for the most egregious ones and didn't really mention the rest. But as you pointed out, the demographics of the participants, the raw scores, and the participant selection methods were also large problems.

    ReplyDelete
  5. Hi Farmwifetwo,

    It took me more than one attempt to read the completed post all the way through for proof-reading, so I know what you mean.

    As for the general issue of whether ID is more common in the more severe forms of autism, I am on the fence. On the one hand I know that it is hard to get a good indication of intelligence in a person with more severe autism. But on the other, there is evidence that causes of ID (i.e. fragile x, creatine deficiency syndromes) can cause the behaviors of autism.

    And then there is the fact that, up until 20 years ago, the majority of cases of autism were seen in children and adults who did have other issues such as ID. It is only in the relatively recent past that autism is appearing frequently in otherwise "typical" children.

    So perhaps looking at the question as whether severe autism causes ID is the wrong way of looking at the problem. Maybe you have to consider autism as a symptom and look past it to the underlying cause and whether that cause is linked to ID or not.

    ReplyDelete
  6. If Raven's tests are a splinter skill, what do they translate to in the real world? Abstract mathematics of course but more everyday stuff too, surely?

    ReplyDelete
  7. Nerkul,

    I am not sure what exactly the Raven's tests would translate into in "real life" for people with autism. The standard definition seems to be abstract reasoning that is independent of a situation but I don't think it can represent that for someone with autism.

    Many (but not all) people on the spectrum have major problems dealing with novel situations or unfamiliar problems - the exact thing that having a high fluid intelligence would help you deal with.

    So maybe the repetition and structure of the Raven's test helps to get it past the interference of autism and measure some underlying abstract reasoning that can't normally be applied because of the situational difficulties of autism?

    Or maybe the structure of the test plays into the obsession with sameness that so many people on the spectrum have and so their brains are able to deal better with the repetitive task?

    Although after looking at all of the papers, I can't tell if the jump from the Wechsler to the Raven's tests in the autism groups really is larger than the jump that the "typical" groups also showed.

    The result could be nothing more than the standard bump that almost everyone in both papers showed between the tests.

    ReplyDelete
  8. Reasonable to suppose that Raven's test the same searchspace techniques that Michelle Dawson writes about here. Autistics like to find more general solutions rather than local optima. Anything that can be modelled abstractly would seem to be amenable to this approach.

    ReplyDelete
  9. "Autistics like to find more general solutions rather than local optimal"

    I don't think that statement is true in general nor does there seem to be sufficient evidence to support that theory.

    If anything it seems like the reverse would be true. Autism seems to force people to focus on the individual details rather than looking at the larger picture. Sort of like focusing on all of the details of a single tree and ignoring the fact that they are in the middle of a forest.

    On a more realistic note, children with autism have a hard time generalizing skills. Even when you manage to teach them how to do something specially (i.e. put on a shirt) they cannot always apply that skill on a more general level (i.e. put on a jacket).

    ReplyDelete
  10. It's definitely true. The searchspace techniques in that study Michelle Dawson commented at length on are an example, plus if you just look at how autistics struggle with novel situations (which require heuristics, local optima) but once they have familiarity with a domain can construct new solutions that aren't accessible to nonautistics. This is visible to varying degrees depending on intelligence, which obviously varies for autistics as much as for anyone, but if you look at famous examples like Temple Grandin, Paul Dirac or Alan Turing, they each abstracted general solutions in problem domains where equally smart nonautistics were using local optima. It's a definite feature. You could see it in your daughter if you watched with an open mind.

    ReplyDelete
  11. How fluid intelligence fits into that is deeply interesting, btw. It would follow from everything I've suggested that nonautistics don't understand novel situations. They apply heuristics and prior knowledge and, mostly, it works well and quickly. Autistics would use their fluid intelligence to work out solutions, which may take a while.

    An example from social dynamics: many times I've seen psychopaths and other actors completely dupe nonautistics while being transparent to autistics. These are superficially standard situations where the heuristic solution leads you to precisely the wrong conclusion. Optical illusions are another example: it's typical that autistics aren't fooled.

    So, to me it makes a lot of sense that autistics have superior fluid intelligence. They simply need it. They practice a lot, and results have shown that fluid intelligence can be improved with practice.

    ReplyDelete
  12. Nerkul,

    I'm sorry, but the idea that some solutions are only accessible to people with autism is just romantic thinking and, quite frankly, rubbish. People with autism are a subset of all people and there is nothing inherent in autism that would set them apart from the rest of humanity.

    If a person with autism can solve a problem in a certain way (i.e. animals and Temple Grandin) then the right "typical" person would be capable of doing the same. It all depends on the perspective that a person is coming from.

    Nor is there really anything to the idea that people without autism don't understand novel situations, it all depends on the specific person. Some people will rely of heuristics, some will understand situations quickly, but most people will use a combination of both.

    ReplyDelete
  13. Certainly not 'only accessible', but autistics' access to higher dimensional visualising of problem spaces naturally makes many solutions more available. This says nothing about their frequent problems in lower dimensions where the rules are more animal than logical.

    ReplyDelete