Degaussing the Iraq Body Count
January 9, 2006
Discussion Thread - Comment# 543
No one knows how many Iraqi civilians have died as a result of the carnage wrought by the Iraq War. But that number goes to the heart of the Bush Administration's Clear - Hold - Build strategy. The President recently opined in an airy sound bite to the press that 30,000 Iraqi civilians may have died as a consequence of the Iraq War, "more or less." He did not say where this number came from, which raises the question of sourcing, nor did he mention the methodology by which it was estimated, which raises a question of reliability. The press bit on the sound bite, and as expected, did not follow up, and not surprisingly the issue has receded from the public consciousness. After al, the Super Bowl is coming and the NASCAR set is eagerly anticipating a new season.
My good friend Andrew Cockburn has not been so distracted. He has probed the body count question by enlisting the help of another good friend, Pierre Sprey, who is a world-recognized expert in the mathematics of non-parametric statistics.
Cockburn goes to great lengths in the following essay to make Sprey's rigorous conclusions accessible to the layman. Unlike most statisticians, Sprey knows how to let the sun shine in. He lets the actual observations do the talking, rather using the parametric method preferred by the majority of statisticians (and almost all economists). The parametric method of statistical analysis requires the practitioner to force fit observations into the theoretical straightjacket of the so-called "normal" bell curve, or more formally, the Gaussian Distribution. I must admit a strong bias here: My own experience has convinced me utterly that the so-called "normal" distribution is so rare that call it an "abnormal" distribution would be excessively charitable. I have long been a proponent of de-gaussing statistics when they don't fit a Gaussian distribution, which is to say, when is the real "normal" condition intrudes on an analysis of the real world data.
I urge readers to study carefully Cockburn's report, because the Bush Administration's Clear - Hold - Build strategy [see Blaster #543] self-evidently depends on winning the hearts and minds of the Iraqi people, and a strategy that is either directly or indirectly causing large numbers of Iraq civilians and children to suffer and die is unlikely to move us closer to this goal. Someone ought to tell the President that knowing the real number is a little more important than a sound bite. Duh.
30,000? No. 100,000? No.
By ANDREW COCKBURN
[re-printed with permission]
President Bush's off-hand summation last month of the number of Iraqis who have so far died as a result of our invasion and occupation as "30,000, more or less" was quite certainly an under-estimate. The true number is probably hitting around 180,000 by now, with a possibility, as we shall see, that it has reached as high as half a million.
But even Bush's number was too much for his handlers to allow. Almost as soon as he finished speaking, they hastened to downplay the presidential figure as "unofficial", plucked by the commander in chief from "public estimates". Such calculations have been discouraged ever since the oafish General Tommy Franks infamously announced at the time of the invasion: "We don't do body counts". In December 2004, an effort by the Iraqi Ministry of Health to quantify ongoing mortality on the basis of emergency room admissions was halted by direct order of the occupying power.
In fact, the President may have been subconsciously quoting figures published by iraqbodycount.org, a British group that diligently tabulates published press reports of combat-related killings in Iraq. Due to IBC's policy of posting minimum and maximum figures, currently standing at 27787 and 31317, their numbers carry a misleading air of scientific precision. As the group itself readily concedes, the estimate must be incomplete, since it omits unreported deaths.
There is however another and more reliable method for estimating figures such as these: nationwide random sampling. No one doubts that, if the sample is truly random, and the consequent data correctly calculated, the sampled results reflect the national figures within the states accuracy. That, after all, is how market researchers assess public opinion on everything from politicians to breakfast cereals. Epidemiologists use it to chart the impact of epidemics. In 2000 an epidemiological team led by Les Roberts of Johns Hopkins School of Public Health used random sampling to calculate the death toll from combat and consequent disease and starvation in the ongoing Congolese civil war at 1.7 million. This figure prompted shocked headlines and immediate action by the UN Security Council. No one questioned the methodology.
In September 2004, Roberts led a similar team that researched death rates, using the same techniques, in Iraq before and after the 2003 invasion. Making "conservative assumptions" they concluded that "about 100,000 excess deaths" (in fact 98,000) among men, women, and children had occurred in just under eighteen months. Violent deaths alone had soared twentyfold. But, as in most wars, the bulk of the carnage was due to the indirect effects of the invasion, notably the breakdown of the Iraqi health system. Thus, though many commentators contrasted the iarqbodycount and Johns Hopkins figures, they are not comparable. The bodycounters were simply recording, or at least attempting to record, deaths from combat violence, while the medical specialists were attempting something far more complete, an accounting of the full death toll wrought by the devastation of the US invasion and occupation.
Unlike the respectful applause granted the Congolese study, this one, published in the prestigious British medical journal The Lancet, generated a hail of abusive criticism. The general outrage may have been prompted by the unsettling possibility that Iraq's liberators had already killed a third as many Iraqis as the reported 300,000 murdered by Saddam Hussein in his decades of tyranny. Some of the attacks were self-evidently absurd. British Prime Minister Tony Blair's spokesman, for example, queried the survey because it "appeared to be based on an extrapolation technique rather than a detailed body count", as if Blair had never made a political decision based on a poll. Others chose to compare apples with oranges by mixing up nationwide Saddam-era government statistics with individual cluster survey results in order to cast doubt on the latter.
Some questioned whether the sample was distorted by unrepresentative hot spots such as Fallujah. In fact, the amazingly dedicated and courageous Iraqi doctors who actually gathered the data visited 33 "clusters" selected on an entirely random basis across the length and breadth of Iraq. In each of these clusters the teams conducted interviews in 30 households, again selected by rigorously random means. As it happened, Fallujah was one of the clusters thrown up by this process. Strictly speaking, the team should have included the data from that embattled city in their final result - random is random after all -- which would have given an overall post-invasion excess death figure of no less than 268,000. Nevertheless, erring on the side of caution, they eliminated Fallujah from their sample.
For such dedication to scholarly integrity, Roberts and his colleagues had to endure the flatulent ignorance of Michael E. O'Hanlon, sage of the Brookings Institute, who told the New York Times that the self-evidently deficient Iraqbodycount estimate was "certainly a more serious work than the Lancet report".
No point in the study attracted more confident assaults by ersatz statisticians than the study's passing mention of a 95 per cent "confidence interval" for the overall death toll of between 194,000 and 8,000. This did not mean, as asserted by commentators who ought to have known better, that the true figure lay anywhere between those numbers and that the 98,000 number was produced merely by splitting the difference. In fact, the 98,000 figure represents the best estimate drawn from the data. The high and low numbers represented the spread, known to statisticians as "the confidence interval", within which it is 95 per cent certain the true number will be found. Had the published study (which was intensively peer reviewed) cited the 80 per cent confidence interval also calculated by the team - a statistically respectable option -- then the spread would have been between 152,000 and 44,000.
Seeking further elucidation on the mathematical tools available to reveal the hidden miseries of today's Iraq, I turned to CounterPunch's consultant statistician, Pierre Sprey. He reviewed not only the Iraq study as published in the Lancet, but also the raw data collected in the household survey and kindly forwarded me by Dr. Roberts.
"I have the highest respect for the rigor of the sampling method used and the meticulous and courageous collection of the data. I'm certainly not criticizing in any way Robert's data or the importance of the results. But they could have saved themselves a lot of trouble had they discarded the straitjacket of Gaussian distribution in favor of a more practical statistical approach", says Sprey. "As with all such studies, the key question is that of 'scatter' i.e. the random spread in data between each cluster sampled. So cluster A might have a ratio of twice as many deaths after the invasion as before, while cluster B might experience only two thirds as many. The academically conventional approach is to assume that scatter follows the bell shaped curve, otherwise known as 'normal distribution,' popularized by Carl Gauss in the early 19th century. This is a formula dictating that the most frequent occurrence of data will be close to the mean, or center, and that frequency of occurrence will fall off smoothly and symmetrically as data scatters further and further from the mean - following the curve of a bell shaped mountain as you move from the center of the data.
"Generations of statisticians have had it beaten in to their skulls that any data that scatters does so according to the iron dictates of the bell shaped curve. The truth is that in no case has a sizable body of naturally occurring data ever been proven to follow the curve". (A $200,000 prize offered in the 1920s for anyone who could provide rigorous evidence of a natural occurrence of the curve remains unclaimed.)
"Slavish adherence to this formula obscures information of great value. The true shape of the data scatter almost invariably contains insights of great physical or, in this case medical importance. In particular it very frequently grossly exaggerates the true scatter of the data. Why? Simply because the mathematics of making the data fit the bell curve inexorably leads one to placing huge emphasis on isolated extreme 'outliers' of the data.
"For example if the average cluster had ten deaths and most clusters had 8 to 12 deaths, but some had 0 or 20, the Gaussian math would force you to weight the importance of those rare points like 0 or 20 (i.e. 'outliers') by the square of their distance from the center, or average. So a point at 20 would have a weight of 100 (20 minus 10 squared) while a point of 11 would have a weight of 1 (11 minus 10 squared.)
"This approach has inherently pernicious effects. Suppose for example one is studying survival rates of plant- destroying spider mites, and the sampled population happens to be a mix of a strain of very hardy mites and another strain that is quite vulnerable to pesticides. Fanatical Gaussians will immediately clamp the bell shaped curve onto the overall population of mites being studied, thereby wiping out any evidence that this group is in fact a mixture of two strains.
"The commonsensical amateur meanwhile would look at the scatter of the data and see very quickly that instead of a single "peak" in surviving mites, which would be the result if the data were processed by traditional Gaussian rules, there are instead two obvious peaks. He would promptly discern that he has two different strains mixed together on his plants, a conclusion of overwhelming importance for pesticide application".
(Sprey once conducted such a statistical study at Cornell - a bad day for mites.)
So how to escape the Gaussian distortion?
"The answer lies in quite simple statistical techniques called 'distribution free' or 'non parametric' methods. These make the obviously more reasonable assumption that one hasn't the foggiest notion of what the distribution of the data should be, especially when considering data one hasn't seen -- before one is prepared to let the data define its own distribution, whatever that unusual shape may be, rather than forcing it into the bell curve. The relatively simple computational methods used in this approach basically treat each point as if it has the same weight as any other, with the happy result that outliers don't greatly exaggerate the scatter.
"So, applying that simple notion to the death rates before and after the US invasion of Iraq, we find that the confidence intervals around the estimated 100,000 "excess deaths" not only shrink considerably but also that the numbers move significantly higher. With a distribution-free approach, a 95 per cent confidence interval thereby becomes 53,000 to 279,000. (Recall that the Gaussian approach gave a 95 per cent confidence interval of 8,000 to 194,000.) With an 80 per cent confidence interval, the lower bound is 78,000 and the upper bound is 229,000. This shift to higher excess deaths occurs because the real, as opposed to the Gaussian, distribution of the data is heavily skewed to the high side of the distribution center".
Sprey's results make it clear that the most cautious estimate possible for the Iraqi excess deaths caused by the US invasion is far higher than the 8,000 figure imposed on the Johns Hopkins team by the fascist bell curve. (The eugenicists of the 1920s were much enamored of Gaussian methodology.) The upper bounds indicate a reasonable possibility of much higher excess deaths than the 194,000 excess deaths (95 per cent confidence) offered in the study published in the Lancet.
Of course the survey on which all these figures are based was conducted fifteen months ago. Assuming the rate of death has proceeded at the same pace since the study was carried out, Sprey calculates that deaths inflicted to date as a direct result of the Anglo-American invasion and occupation of Iraq could be, at best estimate, 183,000, with an upper 95 per cent confidence boundary of 511,000.
Given the generally smug and heartless reaction accorded the initial Lancet study, no such updated figure is likely to resonate in public discourse, especially when it registers a dramatic increase. Though the figures quoted by Bush were without a shadow of a doubt a gross underestimate (he couldn't even be bothered to get the number of dead American troops right) 30,000 dead among the people we were allegedly coming to save is still an appalling notion. The possibility that we have actually helped kill as many as half a million people suggests a war crime of truly twentieth century proportions.
In some countries, denying the fact of mass murder is considered a felony offence, incurring harsh penalties. But then, it all depends on who is being murdered, and by whom.
Andrew Cockburn is the co-author, with Patrick Cockburn, of Out of the Ashes: the Resurrection of Saddam Hussein.
"A popular government without popular information, or the means of acquiring it, is but a prologue to a farce or a tragedy, or perhaps both. Knowledge will forever govern ignorance, and a people who mean to be their own governors must arm themselves with the power which knowledge gives." - James Madison, from a letter to W.T. Barry, August 4, 1822
[Disclaimer: In accordance with 17 U.S.C. 107, this material is distributed without profit or payment to those who have expressed a prior interest in receiving this information for non-profit research and educational purposes only.]