I said in my earlier post that I might write about Richard Tol’s publicly stated views on John Cook’s 97% consensus paper. Since then, however, I’ve had a rather frustrating Twitter discussion with Richard Tol that, sadly, ended quite sourly – as they have a tendency to do. Let me make it clear. I’m not trying to wind-up Richard Tol, nor am I trying to tick him off – although I probably will end up doing so, even if that is not my intent. I’m simply commenting on what he has already openly stated in the public domain.
Anyway, Richard Tol now has a 7th draft of his paper criticising the Cook et al. study, and appears to want to resubmit this to a journal. For those who don’t know, his first submission was rejected by the editor of the journal to which it was submitted. To be fair, Richard has made a number of changes to the paper and has listened to some of the criticisms and suggestions. He even acknowleged that something I had said had made him check something again and slightly change his test. Admittedly, he only mentioned this during a Twitter debate about how best to undertake academic discussions and so was more, I think, trying to make a point about discussions, than actually acknowledging that I had said something credible – although he did acknowledge it and so I should at least be grateful. Now, I do still have issues with Richard’s statistical tests, but I thought I would make a more fundamental point.
Richard’s been making a great deal of noise about John Cook not releasing all his data. He wants all the ratings, not just the final ratings, and even thinks that individual keystrokes and time stamps should be provided, but accepts that this may be asking a bit much. He’s also been critical of the lack of a robust survey strategy. Now, here’s where I have an issue. If the goal of the Cook et al. work was to survey a group of people to, for example, determine their views on climate change, then Richard Tol would be perfectly correct. Such work would require that you had a well-defined survey strategy and that you kept track of all your data so that you could eliminate biases, or discuss biases if any exist. You’d need to know something about the sample; for example, what was the age distribution, the gender distribution, political affiliations, scientific background. In such a case I would completely agree with Richard.
However, the goal of the Cook et al. work was not to survey a group of people, it was to survey a set of abstracts that had been extracted from a database using a well-defined search. The people involved were simply a tool that analysed these abstracts so as to ultimately give each abstract a rating that reflected it’s position with regards to anthropogenic global warming (AGW). In some sense, what is important is whether or not the final rankings “properly” reflect the position of each abstract, not really how this rating was achieved. Now, I should be careful. I’m not suggesting that one doesn’t need to know about the strategy, simply that the requirements with respect to the intermediate data is different to what would be required if the goal of the work was to study the people doing the rating, rather than to study the abstracts.
Let me try and give you an analogy. I do quite a lot of computational work. I take some set of initial conditions (my raw data if you like) and I evolve them using a simulation to produce a result that I then analyse. If someone thinks my results are wrong, they wouldn’t typically ask for my code so that they can check it line by line. They might ask for my code and I might give it to them, but they would then redo the simulations (after checking the code). More typically, they would simply do the simulations using their, or another, code. No journal would let them publish a paper pointing out that my code had an error in line 1002 (for example). A journal would expect that they show the significance of the error by redoing some of the simulations.
So, in my view, to check the validity of the Cook et al. work, you need to redo the analysis of some of the abstracts to see if the new results are consistent with those obtained by Cook et al. Simply showing that some of their intermediate data fails a statistical test, doesn’t really tell you if there’s anything significant wrong with the Cook et al. results. In fact, in earlier drafts of his paper, Richard Tol acknowledged that the Cook et al. results probably did reasonably reflect the level of agreement in the scientific community (this appears to have been left out of the recent draft, although maybe I’ve missed it). Failing a statistical test may indeed indicate something is wrong, but doesn’t prove it. Furthermore, there’s another issue that I have. Presumably someone could design such a study with a very precise and rigorous analysis procedure. This could be designed to pass all (or most) of Richard Tol’s tests. But this doesn’t tell you that such a procedure can suitably rank a sample of abstracts, it just tells you that it satisfies a set of tests that someone thinks are important. Just to be clear, let me restate something. If the goal was to study the people, then passing these tests might well be relevant, but the goal wasn’t to study the people, it was to study the abstracts.
So, this is where I get a little more controversial and somewhat more critical of Richard Tol. The problem I’m having with this whole event is understanding Richard Tol’s motivation. His claim is that he is simply interested in making sure that a piece of work is robust and done properly. Even if the results are correct – he says – if the strategy is flawed, the work has no merit. However, what I have issues with are Richard’s own style and his own strategy. Firstly, he’s often remarkably rude and unpleasant. I have been told that this is pretty standard in his field, but I find it a strange way to interact with other academics. It shows a lack of decency, and if you’re not willing to be decent why should others be decent towards you? As a far as his strategy goes, he has spent quite a lot of time trying to convince people that John Cook’s reluctance to release all his data implies that he’s trying to hide something. Richard’s paper then consists of a set of statistical tests that the Cook et al. data apparently fail, hence indicating a problem with the work. However, as I try to explain above, it’s not clear that these tests are actually telling us anything about whether or not the Cook et al. results are robust. They might be perfectly fine tests to do if the goal was to study the people rating the abstracts, but that wasn’t the goal.
Richard Tol’s intentions may well be good and honourable. I obviously can’t claim otherwise. However, from my perspective, this all seems a little suspicious. Make people think that Cook et al. is hiding something and then when they do release data, run a set of statistical tests that the data fails. Those who don’t know better will think this means the Cook et al. study is nonsense, when – as far as I can tell – it’s told you nothing of the sort. I’m not claiming that there aren’t problems with the Cook et al. study, simply that Richard Tol’s tests aren’t a particularly good indicator of whether or not there are problems. It might indicate something, but until you test the actual abstract ratings, how can you know? It makes me think that Richard is following a similar strategy to that adopted by McIntyre & McKitrick when they tried to debunk Michael Mann’s hockey stick paper (although, with all due respect to Cook et al., I’m not suggesting that the Cook et al. paper is of the same calibre as Michael Mann’s hockey stick paper). If you’re unfamiliar with the story, you can read John Mashey’s comment here. Basically, do something that looks credible but that’s complicated enough that few will have the knowledge or understanding to know if it is actually credible or not.
I appreciate that my last paragraph has suggested that I think Richard Tol’s motives may not be as pure as – I assume – he would like others to think. If this seems unfair, I apologise. Richard is more than welcome to simply ignore this, of course. It is just a blog post and is just my impression, based on what I’ve seen and read. He’s, of course, welcome to comment to clarify his position and to explain his motives. Whether I respond to his comments or not may depend on the tone he chooses to use. I am now on leave again, so am keen to have a nice relaxing week off before going back to work, where I plan to focus on my lecture preparation, write a chapter of a book, and do some of my own research; rather than reading and commenting on climate science. We’ll see if I succeed.