**Update (8 January 2013):** After I wrote this article, I heard that Mother Jones put their data of US mass shootings online. Going through this data, I realized that I made a number of errors in transcribing the data from their website. I have corrected the numbers and graphs in the plots below. These changes actually make the data fit more poorly to a Poisson distribution, weakening my original claim. I apologize for my sloppiness in this regard.

In the wake of the tragic massacre at Sandy Hook Elementary School, there’s been a lot of discussion about whether mass shootings in the United States are on the rise. Some sources argue that mass shootings are on the rise, while others argue that the rate has stayed more-or-less constant.

Steven Pinker, author of The Better Angels of Our Nature: Why Violence Has Declined was recently interviewed by CNN. When asked whether incidents such as the Sandy Hook massacre represent a real rise in mass shootings, he responded:

It’s not clear whether we’re seeing a real uptick, or just a cluster of events that are more or less distributed at random. You’ve got to remember – random events will occur in clusters just by sheer chance. So we don’t really know whether the fact that there are many of them in the year 2012 represents a trend or just a very unlucky year.

In this article, I’d like to use data available online to address this question.

I recently wrote a post about randomness and rare events. The main lesson from that article is that ** randomness isn’t the same thing as uniformity. **For example, if on average, sharks attack swimmers 3 times a year, then just by chance, you will expect to see years in which no swimmers are attacked, and years in which 7 swimmers are attacked. To our eyes, streaks like this don’t seem random. But, as I argue in my previous post,

*we are typically not good judges of randomness.*In particular, we vastly underestimate the likelihood of such streaks. And so the question is,

**how can you test whether a set of events is random?**

Here’s how.** There is a formula that tells you how many times you expect to see streaks arise from a random process. **It’s called the Poisson distribution, and it assumes that your events are rare, have a fixed average rate, and are independent (i.e. that events are just as likely to occur at any time). You can then compare the number of predicted streaks to the real number of streaks in your data, and mathematically test whether a set of events is random or not.

*To summarize: if the incidences of mass shootings in the US match a Poisson distribution, then this argues that the streaks (years with unusually high number of shootings) are expected due to chance. If the data doesn’t fit a Poisson distribution, then this suggests that it violates one of the assumptions – either mass shootings are not independent events, or the rate is falling, or it’s on the rise.*

**The data.** I downloaded data for mass shootings in the United States occurring from 1982 to 2012, from this comprehensive Mother Jones article on mass shootings. I used their numbers because they compiled information from multiple credible sources, and they clearly outlined the criteria they used to classify a crime as a mass shooting. *(Update: this link has the data in easily accessible formats)*

Their data shows a total of 62 mass shootings in 31 years – an average of 2 mass shootings per year. However, 2012 was the most violent year on record, clocking in 7 mass shootings. **Is this an outlier, or would you expect to see streaks this large, simply due to chance?**

To get at this question, I counted years in which there were 0 mass shootings, 1 mass shooting, 2 mass shootings, and so on..

Number of Mass Shootings in a Year | Number of Years |
---|---|

0 | 3 |

1 | 13 |

2 | 5 |

3 | 5 |

4 | 3 |

5 | 1 |

6 | 0 |

7 | 1 |

Out of 31 years of data, we find one year with 7 mass shootings, and ~~four~~ three years with no mass shootings. **Are these values consistent with an average of 2 mass shootings a year?**

To find out, we can compare these counts to a Poisson distribution with an average value of 2.

In the graph above, the blue bars represent the observed instances of 0,1,2,3.. mass shootings in a year. For example, the long blue bar tells us that there were 10 years with one mass shooting per year. The red dotted curve is the Poisson distribution – these are the outcomes that one expects from a random process with an average value of 2 per year. ~~To my eye, the red curve sort of fits the data, but not quite.~~

Number of mass shootings in a year | Observed number of years | Expected number of years (Poisson) |
---|---|---|

0 | 3 | 4.2 |

1 | 13 | 8.39 |

2 | 5 | 8.39 |

3 | 5 | 5.59 |

4 | 3 | 2.8 |

5 | 1 | 1.12 |

6 | 0 | 0.37 |

7 | 1 | 0.11 |

But instead of trusting my eye, we can use statistics to compare these two curves. I used a chi-squared test to test whether the two distribution were significantly different, and found a p-value of ~~0.18~~ 0.09. What does this mean? **It suggests that there is no isn’t strong evidence of clustering beyond what you would expect from a random process. In other words, the occurrences of mass shootings from 1982-2012 are consistent not inconsistent with the assumption that shootings are independent events, occurring at an average rate of 2 per year. **

*However, a p-value of*~~0.18~~ 0.09 is not particularly high, and if we see ~~a few more years~~ another year as extreme as 2012, it’s likely that this will rule out the hypothesis that mass shootings are random events.

What do I conclude from this? **If mass shootings are really occurring at random, then this suggests that they are extreme, hard-to-predict events, and are perhaps not the most relevant measure of the overall harm caused by gun violence. ** (*Update: That last claim is my deduction and not a conclusion of the above analysis – In response to some of the comments at hackernews, I wanted to clarify this point.*) I agree with Steven Pinker’s take, and with this analysis by Chris Uggen, who says:

a narrow focus on stopping mass shootings is less likely to produce beneficial changes than a broader-based effort to reduce homicide and other violence. We can and should take steps to prevent mass shootings, of course, but these rare and terrible crimes are like rare and terrible diseases — and a strategy to address them is best considered within the context of more common and deadlier threats to population health.

We are compelled to pay attention to extreme events. In the words of Steven Pinker, “*we estimate risk with vivid examples that we recall*“. But as much as we should try to prevent these horrific extreme events from taking place, we should not use them as the sole basis for making inferences that determine policy. The outliers are a tragic part of the overall story, but we also need to pay attention to the rest of the distribution.

Pingback: Python SciPy chisquare test returns different p value from Excel and LibreOffice | Code and Programming()