Thursday, August 7, 2014

Statistics and wars -- on the number of civilian casualties in Gaza

A cautionary note: this may be a bit of a sensitive topic to have for a first blog post.

During the last couple of days I encountered a couple of analyses questioning the official figures given in the media about Palestinian civilian casualties during the period of operation Protective Edge in Gaza. Currently, the UN puts the death toll at a minimum of 72% and Gazan groups at 82% and 84%. As a suspicious believer in the power of data, I was interested to read the critical analyses of the Gazan casualty distribution presented by JoËoLTime and others. These analyses claim that the actual number of civilian deaths must be lower than the one provided by the Palestinians. They base their argument on data that shows that 20-30 year old males are much more likely to be casualties than any other gender-age group in Gaza,



This argument is plausible. Hamas and the Palestinians have a strong incentive to distort the number of civilian casualties. One reason is external -- to increase the critique of Israel in the international stage. The other is internal -- to claim that Hamas is resilient and sustained small number of casualties when fighting the IDF. However, we need to be very careful when dealing with statistics -- they can become a powerful tool to understand the world but they are also known to be bold liers. In case of the Israeli-Palestinian conflict, it is doubly so since many people have very strong feelings about it. With that in mind I decided to give a look to the numbers myself.

To perform my own analysis I follow the data sources suggested by JoËoL. I use the casualty list compiled by Al-Jazira, which currently has 1775 entries and does not state whether an individual was a civilian or a militant. There are 1313 entries for which the name, the age, and the gender of the casualty are given. I also use the CIA factbook's data on the population distribution in Gaza. The data can be presented by the two graphs below:


The graphs show the distribution of Palestinian casualties in Gaza during operation Protective Edge (in green) and the general Gazan population distribution (in orange). Because it is a reasonable assumption that militants are more likely to be male than female, I separated the data to male and female groups.

Now, if the IDF had really killed Palestinians at random, then the green and orange bars would have been more or less equal. It is immediately clear that the distribution of male fatalities is highly skewed from what we would expect if fatalities had been completely random. With female fatalities it is less clear -- there is one clear jump in fatalities for the 25-29 age bracket and one clear dip in the fatalities for the 5-9 age bracket.

To make things simple, I will focus on the male population. What factors can drive the difference between actual casualties and the distribution of the population? Here are a few possibilities:
  1. Hamas militants are mostly in the 20-29 age bracket, and they are much more likely to be killed by IDF forces.
  2. Men in the 20-29 age bracket are more likely to be mistakenly thought by IDF soldiers to be Hamas militants.
  3. Children in the 0-14 age bracket are better protected by their families from IDF strikes and are less likely to be mistaken by IDF soldiers to be Hamas militants.
The other analyses of this data only consider the first possibility, explicitly or implicitly. Lets see how that would work. To make matters simple we can use the children fatality rate as an indication of what the random fatality rate is (e.g. casualties caused by mistakes and due to collateral damage). This makes sense as children are not likely to be Hamas militants. The following graph shows the actual number of casualties vs. the number of expected casualties in this case:


If the first possibility I mentioned before is correct -- that is, all excess casualties are Hamas militants -- then the sum of all the differences between the green and orange bars should represent the number of militants killed in Gaza. That sum is equal to 650. Since the data contains 1313 casualties it suggests that the percentage of militants killed is 50%, even if we assume no female militants. This is a much lower estimate than the international ones.

How would our conclusion about the number of Hamas casualties change if children are less likely to be killed than adults (a combination of the first and third possibilities mentioned before)? To answer this question let's assume that the probability of a child fatality (0-14 years brackets) is only 60% of the probability of an adult civilian fatality. The following graph shows the actual casualties vs. expected casualties in that scenario:


You can see (in orange) that there is a jump at the 15-19 age bracket of the expected casualties, which is similar to the jump in actual casualties (in green). The militant casualties are equal to the sum of all differences (green bars minus orange bars) for ages 15-59, which is equal 389. Using this figure we find that 70% of all casualties are civilians -- in line with UN estimates.

Clearly, there are many other explanations for the difference in distribution between actual and expected casualties. If we also assume that that 20-29 year old males are more likely to be killed than other groups (that's the second possibility I mentioned before), then it is easy to find that the civilian casualty rate is 80% -- in line with Palestinian estimates.

The best way to get out of this problem is to find a similar case of conflict in which casualties were truly random. This is not an easy task. At first, I thought of looking at Israeli casualties during July 2014 -- a period in which Hamas fired indiscriminate rocket fire towards Israel. The problem is that there isn't enough data to use -- that is, luckily, there were only few Israeli civilian casualties. I then decided to look at Israeli civilian casualties during the second intifada and the following couple of years. In that period Israeli civilians were, for the most part, randomly targeted by Palestinian terrorist organizations.

I use data on casualties supplied by Wikipedia, which is based on statistics supplied by B'Tzelem. Data contains civilian casualties only, between October 2000 and December 2008. The majority of casualties are from the first years of the hostilities. In total, I have data for 762 casualties, including age and gender. There are 444 male and 318 female casualties. As before, I use the CIA factbook's data on the Israeli population distribution. The population distribution data is from 2013 since I could not find such data for the 2000s. The following graphs present the distribution of Israeli civilian casualties in the second intifada by gender:


The two graphs show the distribution of Israeli casualties in blue and the general Israeli population distribution in red. On the whole, there is one very striking difference between the actual casualties distribution and the population distribution: children (ages 0-14) are much less likely to be casualties. Casualties among other age groups are more or less in line with what we would expect given the population distribution.

First, a disclaimer: we must be careful in using the Israeli data to analyze the Gazan one. For example, one major difference is that Israel attacked residential targets in Gaza while the Palestinian terrorists attacked many public locations. Saying that, if indeed the distribution of civilian casualties in Israel and Gaza are similar then a mixture of the first and third possibilities I mentioned -- that the distribution of Palestinian casualties is explained by IDF targeting Hamas militants and by lower casualties among children than among civilian adults -- seems to be a likely explanation for the rate of civilian casualties in Gaza -- about 70%.

To sum it up, it is very difficult to assess the number of civilian and militant casualties simply from the distribution of casualties. Without knowing the true distribution of civilian casualties there is simply no way to be certain about the number of militants among total casualties. More generally -- be careful of numbers and figures -- they are too easily manipulated to support one reality over the other.

No comments:

Post a Comment