Given the release of a second batch of hacked emails yesterday, S&R decided to pull this analysis from 2010 back to the front. The conclusions reached in this analysis are as applicable to the emails published in 2011 just as much as they are to the original emails from 2009.
It is impossible to draw firm conclusions from the hacked documents and emails. They do not represent the complete record, and they are not a random selection from the complete record.
– Dr. Timothy Osborn, Climatic Research Unit (source)
After several hundred hours of studying the emails and looking at their references, I have no hesitation in stating that, to my satisfaction, the system is rotten to the core and has been from the start.
– Geoff Sherrington, former corporate geologist, (source)
According to Osborn, there is not sufficient context to understand the “true” story behind the published Climatic Research Unit emails and documents. However, according to Sherrington, the emails and references contained therein provide all the context needed in order to conclude that climate change research is complete hogwash. Reality lies somewhere on a continuum between these two extremes – the question is where.
S&R set out to determine whether the published CRU emails provided enough context for the public to condemn or vindicate the scientists involved. After investigating three primary options and reading a key study, S&R has concluded that the emails do not themselves contain sufficient context to understand what really happened in climate science over the last 13 years.
Many people have claimed that the emails contain all the context needed to draw wide ranging conclusions about climate scientists and climate research in general. For example, critics claim like Sherrington and Steve McIntyre of Climate Audit that the emails contain overwhelming evidence of scientific misconduct and/or a conspiracy among scientists even though three separate investigations have ruled to the contrary.
Some scientists, on the other hand, have concluded that the emails clearly show that nothing serious happened.
S&R contacted Steve McIntyre for his views on the CRU emails. While he was terse in his email communications, he suggested that his writings at Climate Audit were a good place to start. In December of 2009, shortly after the emails were published, McIntyre wrote that
climate scientists say that the “trick” is now being taken out of context. The Climategate Letters show clearly that the relevant context is the IPCC Lead Authors’ meeting in Tanzania in September 1999 at which the decline in the Briffa reconstruction [of tree ring data] was perceived by IPCC as “diluting the message”, as a “problem”, as a “potential distraction/detraction”.(source)
In addition, McIntyre wrote the following in March of 2010 in another post at Climate Audit:
Once again, the fact that the decline is discussed in a Nature paper does not justify the deletion of the inconvenient data in the IPCC spaghetti graph [of temperature proxies, including tree rings] in order to provide the false rhetorical consistency that IPCC was seeking. (source)
These quotes illustrate that McIntyre feels that there is sufficient context within the emails themselves to prove that several climate scientists had deleted “inconvenient data” regarding tree rings in service of a political end, namely the removal of a “potential distraction.” This is a charge that, if true, would constitute a serious breach of scientific ethics.
Other critics have claimed that there is abundant context to prove conspiracy by CRU climate researchers and their US associates. Tom Fuller, co-author of Climategate: The CRUtape Letters (Volume 1) with Steven Mosher, posted excerpts from the book at his website. Two key excerpts are quoted below:
The scientists known as ‘The Team’ [Phil Jones, Michael Mann, Keith Briffa, et al] hid evidence that their presentation for politicians and policy makers was not as strong as they wanted to make it appear, downplaying the very real uncertainties present in climate reconstruction….
But the leaked files showed that The Team had done this by hiding how they presented data, and ruthlessly suppressing dissent by insuring that contrary papers were never published and that editors who didn’t follow their party line were forced out of their position.(source)
These quotes demonstrate that Fuller believes that the emails reveal a conspiracy to overstate the certainty of climate disruption, conceal evidence to the contrary, and were willing to manipulate the peer-review system by “The Team.” This is very much in line with something else Sherrington said:
Yes, there WAS a conspiracy and if you cannot find it [in the emails] then you do not have the innate ability to interpret data. (emphasis original, source)
It’s not just critics that believe the emails contain sufficient context to know what really happened, but some climate researchers make this claim as well. In an interview with S&R, Martin Vermeer, first author of a recent PNAS paper on sea level rise, claimed that “I have plenty context to recognise that none of the allegations hold water, even without seeing the balance of the emails.”
[6/3/10 correction: As per the comments below, Vermeer specifically meant “‘the allegations’ refers specifically to the charges of scientific misconduct / breach of ethics. And ‘plenty context’ includes the scientific literature, and knowing first hand how research is done.”]
At this point, the claims of conspiracy and misconduct made by critics have nearly all been rejected by the first Penn State inquiry, the UK House of Commons inquiry, and the Oxburgh panel. In fact, only a single serious claim levied by critics against climate scientists has been substantiated by any of the investigations – that Phil Jones and the University of East Anglia were not sufficiently open to granting Freedom of Information requests. Given that none of the three inquiries has found scientific misconduct and only one found a possible ethical breach (the FOI issue), it’s reasonable to conclude that the CRU emails alone lack sufficient context for broad claims. This is contrary to what Fuller, McIntyre, and Sherrington have said.
Clearly, reality does not lie close to this end of the continuum. But do the emails contain enough context to make even some limited claims? S&R asked this question of Steven Mosher, and his response was essentially “yes:”
Just as missing data in some areas of climate science doesn’t prevent us from making rational statements about global warming, so too the fact of missing mails does not prevent us from describing clearly what we do know about the mails.
Mosher also said that we know enough context to prove that there was a widespread breakdown in scientific ethics among climate researchers. In addition, Mosher claims that both he and his co-author Tom Fuller feel that the emails revealed nothing that alters the conclusions of climate disruption research to date, saying
[t]he charge that we made in our book was not directed at the science. As we argued, the mails do not and cannot change the science. (emphasis mine)
This claim is inconsistent with the excerpts from their book that Fuller quotes at his blog, and it is squarely in conflict with what McIntyre says.
However, Mosher did feel that there was enough evidence to cast significant doubt upon the ethics of the researchers themselves. For example, Mosher said that it was clear from the emails that CRU’s Phil Jones lied to Parliament when he said it was standard practice to not share data. Two other examples that Mosher used in his S&R interview are quoted below:
For example, the issue of “hiding the decline.” That issue is not about Jones hiding data or manipulating data or committing fraud. That particular instance is about the crafting of a message for politicians….
The mails clearly demonstrate that the scientists were concerned about “diluting” the message. They were not concerned with telling the whole truth, but rather a version of the truth that was packaged according to their agenda.
When S&R asked Schmidt for his opinion on the supposed breaches of ethics, he replied
[t]he only issue that can be classed as serious lapse of judgment is Phil Jones dealings with the FOI requests for the IPCC-related emails.
He pointed out that there was a great deal more context within the emails than is being reported on by media. Specifically, he said
The public/blogosphere discussion is so focused on a tiny bit of the issue (basically MBH98/99 and the CRU surface temperature record to the exclusion of everything else) that most of the contextual things I brought up [at the blog Real Climate] were the fact that most of the discussions had nothing to do with either of these things.
Instead, Schmidt said that the show scientists having substantive discussions and disagreements about “uncertainties in the data, the impacts of different techniques.” To Schmidt, the emails show “the primacy of scientific issues over personalities” and reveal that the scientists mentioned in the emails are “more than willing to have out all the issues.”
The House of Commons investigation did find some problems with how the University of East Anglia and CRU handled FOI requests, it’s apparent that the emails did contain enough context to draw at least one strictly limited conclusion. The House of Commons did not, however, claim that Jones lied in his testimony as Mosher claimed and in fact accepted Jones’ explanation that widespread sharing of computer code and raw data is a recent (Internet age) development that the CRU scientists have not kept up with. In addition, Fuller and McIntyre both claimed that the emails contained enough context to prove scientific misconduct, but Fuller’s co-author Mosher claimed that there wasn’t enough context. Given the apparent inconsistencies even between co-authors of the same book, it’s apparent that the emails alone lack enough context to unambiguously conclude one way or another on the broad issues of ethics and misconduct.
The other end of the continuum claims, as Osborn does, that it is not possible to draw any conclusions from the emails alone. As Mike Hulme, climate scientist from CRU, said in an S&R interview, “No-one ever (ever) has the full story.” If that’s true, then the only way to get to the bottom of what really happened is to talk to the people involved, review other information than just the emails, and so on. A simple analysis of the number of emails published vs. the number of emails sent and received by CRU scientists supports this. However, both McIntyre and Mosher feel that a simple numerical analysis is pointless speculation given what we know about the context of the CRU emails. And for his part, Schmidt says that the inquires completed to date have already investigated the wider context of the emails and found the critics’ points untenable.
S&R surveyed its own members as well as Tom Wigley to estimate how many emails were sent per year by different occupations. We found that
- approximately 1,500 emails per year sent by the electrical engineer
- approximately 1,100 emails were sent by the home manager
- between 2,500 and 3,500 emails sent by the marketing professional
- about 1,500 emails were sent by the university English professor
- and about 5,500 emails sent by climate scientist Wigley (with another 33,000 received emails).
If we estimate that the S&R writers surveyed each receive three emails for every email sent, then we get a yearly total of 6,000 emails, 4,400 emails, 10,000 emails, and 6,000 emails respectively for the S&R writers plu a total of about 39,000 emails per year for Wigley. Over the course of 13 years and for a 15-member workgroup (the period of the CRU emails and the size of the CRU), the total for both the electrical engineer and the English professor is 1.17 million emails, 858k emails for the home manager, a minimum of 1.95 million emails for the marketing professional, and 7.51 million emails for Wigley’s. This compares to about 1100 emails published from CRU’s servers. If we treated the emails as data, then we’d be drawing conclusions based on 0.01% (climatology) to 0.13% (home management) of the data that has also been selected using unclear criteria for unclear reasons.
Mosher rejects this data-like approach, however, believing that it ignores smoking guns.
As a defense the appeal to missing context is laughable. Imagine the account who authorizes the issuance of millions of checks over a lifetime of service. Imagine finding one which he writes to himself embezzling a million dollars. Can he appeal to the millions of good checks he wrote to divert attention from the bogus one?
In addition, both Mosher and McIntyre believe that this approach is inconsistent or hypocritical as it relies upon a defense of “we don’t know, so reality must be the way we want it to be.” For example, McIntyre said
If Osborn wishes to argue that the emails are mitigated by context, then he should provide the other emails that demonstrate the mitigation. Until they do so, I don’t see any reason to take seriously the idea that additional emails would show mitigation or why it is unreasonable to proceed on the record that is available.
Similarly, Mosher wrote
Again, if Osborn and Briffa and Jones want to supply a context, either by answering questions or producing mails, they are free remediate their reputations. They choose not to supply additional evidence. They can’t. Because there is no context that makes what they did right….
There isn’t any context which can make it better. If there was, they would produce it.
This is inconsistent with Mosher’s earlier statement that “the appeal to missing context is laughable.” He is engaging in the very same speculation about context that he dismissed as “below serious discussion” and “intellectual buffoonery” in his S&R interview, except he’s appealing to missing data that, in his opinion, would only strengthen, not weaken, support for his criticisms.
The numerical analysis suggests that, as Osborn said above, there’s not enough emails to understand their context. It would take an inquiry or three to truly understand what happened and why, a point that Schmidt made as well:
Tim Osborne is absolutely correct that there is much more context than is in the emails – and much of this has been brought up in the various inquiries – and that strongly supports the contention that no misconduct or wrongdoing occurred. (emphasis original)
This investigation has largely rested upon logic rather than on data. But there is some research data upon which we can make stronger conclusions. Specifically, Jorge Aranda and Gina Venolia wrote a paper titled The Secret Life of Bugs: Going Past the Errors and Omissions in Software Repositories that was published in the Proceedings of the 31st International Conference on Software Engineering. It reports on research the authors did on the reliability of electronic records like software bug databases. However, their methods and conclusions have a much broader application to the question of the reliability of all research that is based exclusively on electronic records like the published CRU emails.
Aranda and Venolia started by looking at randomly chosen records in a electronic bug tracking database and extracting as much information as they could from the information stored in the database. The authors then contacted all the people mentioned and interviewed them to get a better understanding of the status of the bugs, what occurred, who was responsible, etc. The authors reviewed email records, documentation, and any other artifacts related to the bug that they could find, and they always tracked the bugs to their origination and completion. In the process, the authors described four different levels of analysis:
- “automated analysis of bug record data”
- “automated analysis of electronic conversations and other repositories”
- “human sense-making”
- “direct accounts of the history by its participants”
The authors performed their analysis at Level 4, the most detailed level. For comparison, the CRU emails themselves represent a Level 2 analysis, the work of McIntyre, Mosher, Fuller, most bloggers, and many journalists are examples of a Level 3 analysis, while the major inquiries represent a Level 4 analysis.
Tables 2 and 3 from the Aranda/Venolia paper, showing how many new
events and new participants were discovered at each different level
What the authors found in the course of their analysis was that “the differences between levels were stark, quantitatively and qualitatively.”
In fact, even considering all of the electronic traces of a bug that we could find (repositories, email conversations, meeting requests, specifications, document revisions, and organizational structure records), in every case but one the histories omitted important details about the bug. (emphasis mine)
In more specific terms, the paper found that the electronic records including the email records were missing or had incorrect data, failed to include events that were critical to solving the bug, didn’t describe structural issues and problems related to group dynamics and internal company politics, and had very little explanation of why things were done. For example, the authors found that the steps required to reproduce a bug, the list of corrective actions taken to try and fix the problem, and the root cause were often missing from the electronic records. The bugs often had lifespans that started in advance of the official record or ended either far before or well after the bug was actually declared “fixed.” The authors found that the officially responsible person (ie the bug’s “owner”) was not the person actually responsible for fixing the bug 34% of the time and were totally unrelated 11% of the time. Furthermore, in 7% of the bugs, all of the people listed in the electronic record had no relationship at all to the bug.
One of the key results of the paper was that, in all cases, Aranda and Venolia found that they couldn’t understand the rationale behind the activity described in the artifacts without actually talking to the people involved. For example, they couldn’t understand why a bug languished unchanged for months but was then suddenly fixed after a period of furious activity, why some bugs weren’t fixed at all, or why someone else filed a bug report even though the bug was suspected to be a false alarm.
The results all point to a few key conclusions. First, “electronic repositories hold incomplete or incorrect data more often than not.” Second, “the bugs in our case pool had far richer and more complex stories than would appear by automatically collecting and analyzing their electronic traces.” Third, Level 4 analyses do not just produce longer stories – the stories “change qualitatively in ways that are deeply relevant to the study of coordination.” And finally,
It is unrealistic to expect all events related to a bug to be found in its record or through its electronic traces. Naturally, most face-to-face events left no trace in any repository. But in some occasions, the key events in the story of a bug had left no electronic trace; the only way to discover them was through interviews with the participants.
So what does this mean for the context of the published CRU emails? Can we trust analyses of the purely electronic record of the published CRU emails alone to provide us the context we need to understand whether there was scientific misconduct or ethical breaches? The answer has to be “no,” and Aranda and Venolia state this directly:
The most striking lesson from our cases is the deep unreliability of electronic traces, and particularly of the bug records, which were erroneous or misleading in seven of our ten cases, and incomplete and insufficient in every case. (emphasis mine)
This is indirectly supported by Mosher’s own analysis of the emails. He pointed out that
[t]here are enough of these mails, mails which have nothing of import, to suggest that the mails were collected and filtered by a harvesting program. A program that looked for certain authors, and certain key words. (emphasis mine)
Automated filtering and processing of emails is, at best, a Level 2 analysis, but what is needed to truly understand the context of the emails is a Level 4 analysis.
Mike Hulme also agrees with Aranda’s and Venolia’s conclusions, saying
[t]he released emails are only a fraction of all the correspondence between the relevant scientists, not to mention telephone calls, breakfast conversations, texts, etc., etc., etc.
If, as the paper’s authors found, in every one of their cases the electronic records that had been searched by automated means were “incomplete and insufficient,” then reality lies somewhere between “we have enough context to draw limited conclusions” and “we don’t have enough context to draw any conclusions.”
This leaves us in the position of having to rely on the results of the five different inquires and investigations that have been completed or will be completed soon. McIntyre pointed out that “it’s a matter of record that the Oxburgh and Penn State inquiries didn’t take any submissions or testimony from Climategate targets or critics,” and by the guidelines of the paper, this could in fact be a problem. However, Schmidt pointed out that
The House of Commons inquiry did take submissions – and the critics have dismissed that too. Muir-Russell has as well, and I guarantee the same complaints will be heard when that reports too.
As with scientific research, when multiple lines of investigation all come to the same conclusions, the likelihood that the conclusion is correct increases significantly. This is also true of the Climategate inquiries – having one or two failing to ask McIntyre his opinion or accept outside submissions doesn’t automatically negate their conclusions, especially when the conclusions are similar to those inquiries that did take his submissions. There are multiple possible reasons for why anyone’s input might not be sought out.
In his interview, Mosher appeared to agree with the Aranda and Venolia that a Level 4 analysis was warranted.
To put the emails in a full context in every case would require one thing. Somebody with knowledge of the mails sitting down with Jones, Briffa, Osborn and others to ask them a few simple questions.
However, the inquiries have already done what Mosher suggests and yet he, McIntyre, and other critics remain unwilling to accept the conclusions of the inquiries.
Given the demonstrated unreliability of electronic records that have been sorted or analyzed using automated tools, it’s unreasonable to make firm claims either of scientific misconduct, ethical lapses, or illegality based on only the published CRU emails. It takes full inquiries and investigations where the investigators talk with the involved parties to truly understand the details and the context surrounding claims like those made against the climate scientists mentioned in the published CRU emails. To date, three such inquires have been completed, and while there may be some areas where the inquiries can be fairly criticized, the fact that the results of all three agree with each other strongly suggests that Tim Osborn’s claim, rather than Geoff Sherrington’s, is closer to correct in this case – “It is impossible to draw firm conclusions from the hacked documents and emails.”
University of East Anglia
IPCC TAR WG1 Figure 2-21
IPCC AR4 WG1 cover
UK Parliament House of Commons
Proceedings of the 31st International Conference on Software Engineering
Special thanks to Prof. Steve Easterbrook, who pointed me toward the Aranda and Venolia paper.