Sunday, April 18, 2010

Analyzing the Media's Interest (aka Joseph's Folly)

Yesterday I wrote a rebuttal to something that Joseph posted at Natural Variations.  Joseph was trying to show that the media was loosing interest in the so-called "antivax" movement and used a chart he made up to prove his point.  I thought his analysis was  flawed and pointed it out to him but he obviously felt his analysis was valid.

So I did what every sarcastic person would do and wrote a rebuttal  here that tried to point out how silly his analysis was.  I did this by doing a similar analysis on the topic of neurodiversity.  What I wrote was what meant to be a tongue in cheek analysis of how disinterested the media was in the neurodiversity movement (fortunately for most people with autism, the media seems to have very little interest in playing up autism as some sort of gift).

Joseph saw what I wrote and responded with a misinterpretation of what I said.  He seemed to think that I reproduced his initial result, even though I thought I was quite clear that I felt the entire analysis was bogus. I did not initially go into any great detail at the time about what was wrong with what he was saying, but since he has chosen to continue the error, I though it might be a good idea to show exactly where his analysis is flawed.

So, lets me start at the beginning at the root of the problem, the raw data that Joseph used in his analysis.  The data was obtained by doing simple searches on Google news archive for specific search terms.  Each term was searched for a specific year (1997 to present), and the number of articles per year was collected.  Using Google news archive as a source of data is the first major problem and, in my opinion, makes his entire analysis almost meaningless.

The problem, simply put, is consistency.  If you are looking at how a specific value changes over time, you had best be measuring the same thing for each data point.  The problem with Google news archive is that the media sources that go into it change quite frequently.  If you search for articles in 1997 it might use 100 sources (as an example), but the same search for 2007 could be looking in 500 sources.  These source might have some overlap between the two periods or they could be completely different.

The other problem with Google news archive is that it might not be representative of the media as a whole.  Some major publishers have decided to exclude their materials from the archive while others, such as the Associate Press, might be included multiple times due to syndication. Yet another problem would be that a source such as the New York Times would have as much weight as some random small town paper in the article count, or possibly more.  Quality doesn't count, just quantity.

The end result is a lack of a consistent data set.  Each year worth of data is likely coming from a different set of publishers and no two years would be measuring the same thing.  If you are trying to look at a trend, you want to be measuring the same thing for each data point.

A much better approach would have been to pick several large media organizations that have been in business since at least 1997 and look at the number of articles that each of these published on the subjects over the years.  This approach would have a consistent set of sources, eliminate duplications, and would allow less credible outlets to be excluded.

So the problem is that Joseph is not comparing apples to apples and his yearly numbers don't even represent the same thing each year.  Garbage in, garbage out.

But lets set that problem aside for a moment and look at the actual raw numbers.  The numbers below represent the number of articles per year from Google News Archive that come up for the search terms "autism" and "autism and vaccine". Please note that value from 2010 is an estimate that I created using the values to date from 2010. I used a relatively simple way of estimating but it should be accurate enough for these purposes.




YearVaccineAutism
1997521,840
19981262,970
19991913,580
20004084,720
20017646,010
20021,3807,830
20031,0508,980
20041,39011,700
200578314,100
200662313,500
200780721,000
20081,87025,200
20091,54025,500
2010*1,66723,560

The question before us is whether the media is become more or less interested in the vaccine/autism connection, and to do that we need to assess whether the interest is growing, declining, or staying the same.

Just looking at the raw numbers above, a few things are clear.

First, the number of articles about autism has exploded , increasing more than tenfold since 1997.  But do you notice the increases in 1998 and 2007?  Each of those is about a 60 percent increase (61%, 56%) and I have to wonder if they are caused by changes of data sources in Google news archive.

Second, the number of vaccine/autism articles has grown as well.  The values start out relatively small and grow rather rapidly until they reach their first peak in 2002-2004.  The value then drops drastically the next year and stays low for two years before bouncing back beyond its former peak in 2008.  The following two years go down slightly, but still are higher than the last peak in 2002-2004.  Again, I have to wonder if the sharp drop in 2005 and the equally sharp rise in 2008 aren't results of changes in Google news archive.

If you look at the chart below, the trend in vaccine/autism articles is clear.  The number of vaccine/autism articles is going up with time.  I added a simple three period moving average to the chart to show the general trend of the numbers (a moving average smooths out the irregularities in the data and helps to show the general direction).


If we were just measuring interest by number of articles, there is clearly an increase in interest over time.  But Joseph went beyond this, so lets keep going.

If you remember what the two sets of numbers represent, you will see that the vaccine/autism set is a subset of the total autism set.  Or in other words, the vaccine/autism articles are only a part of the total interest in autism.  So lets look at the total number of vaccine/autism articles compared to the autism articles to see if there is a relationship.



As you can see, these two number tend to move together. When there are more articles about autism there are more about vaccine/autism and, when there are less about autism, there tend to be less about vaccines/autism. One of the ways you can measure this relationship is with a correlation, and these two data sets are highly correlated (~ 0.76).

So we know the topics tend to move together, so the next question is what is the overall trend for both sets?

There are clearly more articles for both sets as time goes on, but is there consistent pattern of growth in the number of articles, showing an increasing interest, or is there a slowdown, showing a decrease?  One way to measure this is to look at the year over year change of each value, expressed as a percentage.  If interest in a topic is growing, you expect to positive growths each each.


As you can see from the above chart, both topics seem to increase most years. The vaccines/autism shows growth in 9 of the 13 years while the autism topic in general has growth 11 of the past 13 years. The average yearly growth for vaccines/autism is 44% and the average growth for autism is 23%.  It would have been nice to see a more consistent patter of growths (percents all trending  up or down), but the real world isn't as clean as that.

So we have an overall increasing number of articles and most years are showing a positive growth, so why does Joseph think that interest in vaccine/autism is shrinking?  The answer is that he didn't directly answer that question, he answered a different one. 

If you remember, the question is whether the media is becoming less interested in writing about the possible vaccine/autism connection, and Joseph used a chart similar to the one below to show that it was.


But look closely at what the chart actually shows, and you will see the trick.

The chart isn't showing you the trend in the vaccine/autism articles.

What the chart shows is the relationship between the number of vaccine/autism articles and the total number of autism articles.  So, the question that Joseph is answering is whether the number of vaccine/autism articles is growing as fast as (or faster than) then total number of articles about autism.  And the answer to that question is that the vaccine/autism articles - as a percentage of all articles - is going down.

But that wasn't the question.  If the number of autism articles were constant, or changing slightly, then this analysis might make sense.  But, if you look at the raw numbers above, you can see that the autism numbers are growing rapidly.

Lets me make this simple.  On the chart above for 2002, you can see the value is almost 18 (17.6 to be exact), that value is a percentage.  Look at the value for 2009, the value is about 6 percent.  This is the trick, which value is larger?  Did the number of articles (interest) increase or decrease between these two time periods?  If you just consider the percent values on the chart, you would think that you have less interest in 2009 and 2002.  But, in reality, the number of articles in 2009 was 10% higher than 2002, not lower - yet Joseph would have you believe that there was a drop in interest between those two years.

Percentages can only be compared if they have the same denominator.  In this case, the denominators aren't even close and the result is the analysis is skewed.

Let me make this even simpler, which is more - 20% of 100 or 10% of 400?  (Hint, the 10% is more)

So what is the answer to the real question?  Is the media's interest in the vaccine/autism story growing, shrinking, or staying the same?  The chart below should tell you the answer.


The number of autism articles  has been growing quickly, but the overall interest in the vaccine/autism articles has not been shrinking.  As a matter of fact, if you look a the numbers above, you can see that the interest in the vaccine/autism topic has also been growing slightly, just not as fast as the autism topic as a whole.

Joseph's analysis is simply wrong.

1 comment:

  1. One wonders how many of these 'articles' are actually 'new' and how many are 'reblogs' of a single source article?
    One would expect mindless blogging of some topics to be more prevalent than mindless blogging (read: slactivism) of others.

    ReplyDelete