# Are mass shootings really random events? A look at the US numbers.

Update (8 January 2013): After I wrote this article, I heard that Mother Jones put their data of US mass shootings online. Going through this data, I realized that I made a number of errors in transcribing the data from their website. I have corrected the numbers and graphs in the plots below. These changes actually make the data fit more poorly to a Poisson distribution, weakening my original claim. I apologize for my sloppiness in this regard.

In the wake of the tragic massacre at Sandy Hook Elementary School, there’s been a lot of discussion about whether mass shootings in the United States are on the rise. Some sources argue that mass shootings are on the rise, while others argue that the rate has stayed more-or-less constant.

Steven Pinker, author of The Better Angels of Our Nature: Why Violence Has Declined was recently interviewed by CNN. When asked whether incidents such as the Sandy Hook massacre represent a real rise in mass shootings, he responded:

It’s not clear whether we’re seeing a real uptick, or just a cluster of events that are more or less distributed at random. You’ve got to remember – random events will occur in clusters just by sheer chance. So we don’t really know whether the fact that there are many of them in the year 2012 represents a trend or just a very unlucky year.

In this article, I’d like to use data available online to address this question.

I recently wrote a post about randomness and rare events. The main lesson from that article is that randomness isn’t the same thing as uniformity. For example, if on average, sharks attack swimmers 3 times a year, then just by chance, you will expect to see years in which no swimmers are attacked, and years in which 7 swimmers are attacked. To our eyes, streaks like this don’t seem random. But, as I argue in my previous post, we are typically not good judges of randomness. In particular, we vastly underestimate the likelihood of such streaks. And so the question is, how can you test whether a set of events is random?

Here’s how. There is a formula that tells you how many times you expect to see streaks arise from a random process. It’s called the Poisson distribution, and it assumes that your events are rare, have a fixed average rate, and are independent (i.e. that events are just as likely to occur at any time). You can then compare the number of predicted streaks to the real number of streaks in your data, and mathematically test whether a set of events is random or not.

To summarize: if the incidences of mass shootings in the US match a Poisson distribution, then this argues that the streaks (years with unusually high number of shootings) are expected due to chance. If the data doesn’t fit a Poisson distribution, then this suggests that it violates one of the assumptions – either mass shootings are not independent events, or the rate is falling, or it’s on the rise.

The data. I downloaded data for mass shootings in the United States occurring from 1982 to 2012, from this comprehensive Mother Jones article on mass shootings. I used their numbers because they compiled information from multiple credible sources, and they clearly outlined the criteria they used to classify a crime as a mass shooting. (Update: this link has the data in easily accessible formats)

Their data shows a total of 62 mass shootings in 31 years – an average of 2 mass shootings per year. However, 2012 was the most violent year on record, clocking in 7 mass shootings. Is this an outlier, or would you expect to see streaks this large, simply due to chance?

To get at this question, I counted years in which there were 0 mass shootings, 1 mass shooting, 2 mass shootings, and so on..

Number of Mass Shootings in a Year Number of Years
0 3
1 13
2 5
3 5
4 3
5 1
6 0
7 1

Out of 31 years of data, we find one year with 7 mass shootings, and four three years with no mass shootings. Are these values consistent with an average of 2 mass shootings a year?

To find out, we can compare these counts to a Poisson distribution with an average value of 2.

In the graph above, the blue bars represent the observed instances of 0,1,2,3.. mass shootings in a year. For example, the long blue bar tells us that there were 10 years with one mass shooting per year. The red dotted curve is the Poisson distribution – these are the outcomes that one expects from a random process with an average value of 2 per year. To my eye, the red curve sort of fits the data, but not quite.

Number of mass shootings in a year Observed number of years  Expected number of years (Poisson)
0 3 4.2
1 13 8.39
2 5 8.39
3 5 5.59
4 3 2.8
5 1 1.12
6 0 0.37
7 1 0.11

But instead of trusting my eye, we can use statistics to compare these two curves. I used a chi-squared test to test whether the two distribution were significantly different, and found a p-value of 0.18 0.09. What does this mean? It suggests that there is no isn’t strong evidence of clustering beyond what you would expect from a random process. In other words, the occurrences of mass shootings from 1982-2012 are consistent not inconsistent with the assumption that shootings are independent events, occurring at an average rate of 2 per year. However, a p-value of 0.18 0.09 is not particularly high, and if we see a few more years another year as extreme as 2012, it’s likely that this will rule out the hypothesis that mass shootings are random events.

What do I conclude from this? If mass shootings are really occurring at random, then this suggests that they are extreme, hard-to-predict events, and are perhaps not the most relevant measure of the overall harm caused by gun violence.  (Update: That last claim is my deduction and not a conclusion of the above analysis – In response to some of the comments at hackernews, I wanted to clarify this point.) I agree with Steven Pinker’s take, and with this analysis by Chris Uggen, who says:

a narrow focus on stopping mass shootings is less likely to produce beneficial changes than a broader-based effort to reduce homicide and other violence. We can and should take steps to prevent mass shootings, of course, but these rare and terrible crimes are like rare and terrible diseases — and a strategy to address them is best considered within the context of more common and deadlier threats to population health.

We are compelled to pay attention to extreme events. In the words of Steven Pinker, “we estimate risk with vivid examples that we recall“. But as much as we should try to prevent these horrific extreme events from taking place, we should not use them as the sole basis for making inferences that determine policy. The outliers are a tragic part of the overall story, but we also need to pay attention to the rest of the distribution.

64 Comments

Filed under Social Science

• http://twitter.com/ivanca Ivan Castellanos

You are separating mass shootings from normal shootings/murders, which is not acceptable in a statistic study.

• tba

Of course it’s acceptable if he’s just studying mass shootings.

• http://twitter.com/ivanca Ivan Castellanos

Is idiotic to draw conclusions about a subject from only a little subset of data about that subject. Is like a study about in-tunnel massive car crashes that draw conclusions about all kinds of car crashes. Using subsets of data is a technique widely used by Fox News and many other strongly biased news outlets.

And is not like human life value increases exponentially when they are murdered in a close space and time; someone killing 20 children in 5 minutes is not really worst than someone killing 100 in 2 years.

• http://twitter.com/ivanca Ivan Castellanos

I was IP banned in hacker news (jQueryIsAwesome) for copying this comment there and declaring that I flagged it for the mentioned reasons.

I don’t feel bad about it, you don’t use a little ambiguous controversial subset of data, draw conclusions from it and call it a valid article. You just don’t.

• http://www.empiricalzeal.com Aatish

I came in to this question genuinely wanting to know whether the statistics are consistent with the idea that mass shootings are random events that occur at a constant rate.

My conclusions are
1. The data is consistent with the assumption that US mass shootings are random unrelated events with a rate of 2 per year. This doesn’t mean it is definitely the case, it means this model can explain the data so far.
2. The 7 mass shootings in 2012 does not necessarily signal a significant change from the past 30 years. (Although if it happens again, then yet, it probably does)

These are not the conclusions that I set out to achieve, but they are what I found, and I will be happy to be proven wrong or apply the analysis to different data.

Mass shootings may generally be an ambiguous term, but it is one that is used in this national conversation, and I specifically chose the Mother Jones data because they are careful in describing exactly what they consider a mass shooting.

• http://twitter.com/ivanca Ivan Castellanos

First I’m sorry for the arrogance in my previous comments; leaving that behind, lets continue:

You are using 5 criteria points; all those measures only help establish what is considered mass shootings; but you have almost no data about the mass shooters themselves; wish is arguably the most important factor in a mass shooting. That means you don’t take into account motivations, psychological profile, gun availability in the area and many of other variables about the perpetrator. Not only that but the criteria you are using is very strict about what a mass shooting is; so your statistical sample is very small (62).

A little example: Lets say that Sam tells you about 4 forest zones and the amount of apples in every zone; Sam haves a very strict definition of what an apple is so you trust his data; later on based on the amount apples you conclude that the amount of apples in each zone are really random (coincidentally with a p-value of 0.18); but you decide to go the 4 zones and to your surprise; 2 of the zones have a river in the middle, another one haves a toxic factory in the middle and the other is desertic and have no trees.

In retrospective was the amount of apples really random? No, it was never random; you just didn’t know some very basic details so it looked random in your math. Is obvious that in any situation you can’t know all the variables but by only looking to a very slanted byproduct of something else your statistic will have no trustful conclusions.

I would also like to add that many of your conclusions seem to be derived from common sense and not explicitly connected with your numeric results; with most of them I agree but is still disingenuous to pretend it is a logical conclusion derived from your statistical analysis.

• Thomas Welsh

Very well spoken. Thank you.

• http://www.facebook.com/profile.php?id=1316818033 Laurie Wiegler

When I found out I had a 1 percent chance of getting pregnant, I went crying to my doctor.. “But if your pregnancy is that 1 percent,” he assuaged, “you could have a 100 percent chance.” I don’t approximate your math skills here (or my doctor’s), but it would seem that like fertility, one can’t possibly throw in all these averages without adding other factors.

• Chris Martin

If you have the exact date of each event, aren’t you doing your data a disservice by merely counting events per year?

• http://www.empiricalzeal.com Aatish

Hmm.. are you considering looking at the distribution of time between events? It should fit an exponential distribution with a rate parameter of about 2 per year – I might check this.

• Gregor

Poisson analysis just confirms that the events are outliers.

Given the hypothesis that mass shootings tend to provoke mass shootings, a better approach would be to look at the probability of a particular shooting being found within a cluster. Run that while varying a threshold on the dispersion which associates events with a given cluster.

Also, from a social engineering perspectives – there are very few incidences of mass knifing, while mass shootings are strongly correlated with the available density of firearms.

• M.

Knifes are not guns. Entirely different.

• Nicholas Lukas

that’s the point he @00cb3901bec8726bb1070a3e86a9d1ec:disqus is trying to make. you can make the argument against gun control saying there are many other unregulated objects that are dangerous.

but guns enable mass killings far more so than any other weapon. constitutional rights aside, more needs to be done about gun control and regulation..

this piece is a great read on statistical theory but its application and inferenced result is both poorly based and disturbing.

• 2ABill

That is untrue on its face.

The HappyLand fire in 1990 used a gallon of gasoline and two matches, resulting in 87 fatalities: http://en.wikipedia.org/wiki/Happy_Land_fire

The Oklahoma City bombing in 1995 resulted in 168 fatalities. Diesel fuel and fertilizer:
http://en.wikipedia.org/wiki/Oklahoma_City_bombing

September 11th 2001 – nearly 3,000 fatalities and billions of dollars in property damage, lost tax revenue, etc. Box cutters.

• http://www.facebook.com/people/Paul-David-Smith/1548548486 Paul David Smith

I’m not willing to put my “constitutional rights aside”. Guns are the most regulated consumer product in the nation and none of the proposed restrictions…which were all in place in Connecticut…address the problem, All impose burdens on the innocent that are not acceptable in a free society.

• CEriksen

Knives…the plural of knife is knives

• http://www.empiricalzeal.com Aatish

Not really – a Poisson analysis also addresses the question of whether mass shootings promote mass shootings. It explicitly assumes that this is not the case and that the events are independent. If such an analysis explains the data (and I believe it does so, but weakly) then this suggests that the copycat effect is not strong enough to rule out chance. However, I emphasize that another few outliers (years with unusually high mass shooting incidences) would easily tip this analysis in the opposite direction.

• http://twitter.com/djelkind David J. Elkind

You’re correct that a Poisson distribution assumes a fixed rate parameter. This does not, however, help establish whether a variable rate parameter has a greater or smaller likelihood relative to the fixed rate model because there are any number of ways to estimate a variable rate parameter. Indeed, how to specify the process underlying a variable rate parameter would be a really interesting model to test, evaluating, for example, the “copycat effect” of mass shootings in a year, whether restrictive gun laws reduce mass shooting events, whether mental health funding reduced mass shooting events, etc. The likelihood statistics (not p-values) from these various models would provide criteria to evaluate which model is more plausible relative to the other models.

The p-value on its own only expresses the traditional probability that the specified fixed-rate relationship could have arisen due to random chance; that is, the chi-square test you used shows a 9/50 chance that it was a simple statistical fluke. That is not the same as specifying and testing an alternative, variable-rate model.

• http://www.empiricalzeal.com Aatish

That’s a really thoughtful comment, and this would certainly be a more focused and sophisticated way to address the question of whether there is variation in the mass shooting rates, and also study the factors that affect it. I realize that my analysis is somewhat simplistic, in that it can disprove a constant rate model, but not really prove it (or provide a likelihood for it). One of my aims in putting this out there was to understand the shortcomings of this simple approach, and get suggestions on how to better model this phenomenon. So thanks very much for your insight!

• http://twitter.com/djelkind David J. Elkind

Glad to contribute to the conversation.

Gary King’s book Unifying Political Methodology provides a description of how to derive models with a number of “moving parts” using likelihood theory, particularly models where observations are not identically and independently distributed. I would suggest starting there if you are interested in taking this project further. Event count models are basically Dr. King’s wheelhouse, and so his work on that topic seems particularly relevant here.

I’m always interesting in arguing about methodology. My twitter is @djelkind; we can exchange email addresses if you would like.

• Elin

Showing that events are independent year to year does not necessarily mean they are independent within a year. You would have to look at week by week or something and then you would have to deal with seasonality on top of that not to say needing to actually show that shooter b even had information about shooter a, which is hard given that they are usually dead. I’m not saying copy cat doesn’t happen but it’s very hard to show statistically and easy to get a false positive.

• Super

Aatish, I think your analysis is not telling you anything at all. Sure, the data is consistent with random events, but to say anything conclusive you need to include proper error-bars and compare with other scenarios to see wether one explanation fits the data better than another. With the limited amount of data in this case, the statistical error is so large that basically any model-explanation will fit, including your assumption here about independent random events.

• http://www.empiricalzeal.com Aatish

Thanks for bringing this up. You’re probably right – the data is quite limited to make a strong conclusion. My main points were that 7 shootings in a year is consistent (or rather, not inconsistent) with an average of two per year, because outliers can happen by chance, and that it is better to draw conclusions based on the data rather than relying on intuition.

I agree that the data is a bit scant to make this conclusion strongly – I’m not particularly happy with this post. If I had time to improve this, I’d try and compare the likelihood of the data fitting this model, to say a variable rate model (as another commenter here suggested).

• http://www.facebook.com/people/Glen-Raphael/749878721 Glen Raphael

There are lots of incidences of mass knifings in other cultures. Notably, China and Japan. http://en.wikipedia.org/wiki/Akihabara_massacre http://en.wikipedia.org/wiki/School_attacks_in_China_(2010%E2%80%932012)

• Jon

Surely finding that your country has two mass shootings a year is very disappointing and something to improve upon. Is two a year acceptable? I would say the US should aim for zero mass shootings a year, no matter what the data says.

• http://www.empiricalzeal.com Aatish

Word.

• http://www.facebook.com/once.upon.a.jay Jeremie Pelletier

And how would you achieve zero shootings a year? How do you improve upon the current situation?

Sure these events are unacceptable, but they are also unpredictable. Laws won’t change a thing; shootings will still happen but the rest of us will have to live with limited rights and the illusion of safety.

Stopping these is probably harder than stopping all natural disasters. At least the natural disasters are more or less predictable with today’s technology.

• fortythree

If a frequency of occurrence is random, that does not imply that it can not be controlled in any way. Coin tosses are random because people happened to design coins that way. Do you mean to say that the measures in place to prevent such happenings, such as security systems, counselling and medical help to mentally disturbed people, or laws in general that are supposed to act as deterrent have no influence on the frequency and outcome of these events, just as gun control wouldn’t? I can understand we don’t have technology in place to predict and storms and earthquakes. Is it so bad with people and society and governance? Given that human race is at its most peaceful point, why there shouldn’t we strive for a more peaceful outcome of these events then, i.e. non occurrence?

• http://www.facebook.com/once.upon.a.jay Jeremie Pelletier

That’s what I’m saying, all the measures in place have little to no positive outcome. For the most part they provide an illusion of safety for the rest of us.

Security, counselling and meds are surface solutions to a problem that runs much deeper within our society. They can make the symptoms go away, but not the problem.

These sciences are even less mature than the ones predicting weather. What we do for the most part is guesswork.

• Iuventius

How about zero car wrecks per year too?

• Gerry

How about zero traffic fatalities per year?

• paul

How many other man-caused deaths per year are there in the US? People are murdered for the crime of being unwanted. How is that any less significant than these shootings (which are, of course still horrific!)

• gwendolyn

I concur. I have to ask though in these two mass shootings how many people do you predict will be killed? Sorry to be so brash. But it is a little odd to bind something so lethal and heartbreaking to science.

• Fabrizio

I am curious about the 2 per year average. Why not 1 or 3?

If the average in the rest of the world was 1 or 3, would that give indications about the situation in the US?

Given that mass shootings are 2 per year on average in the US and that the US population is roughly 200 million, does that mean that every US newborn has a 1/100 millions chance of becoming a mass murderer?

Or, does the 2 per year average come from US population deloping a mass shooter with a frequency of 1/100 million? (as compared to a possibly different chance for other countries?)

• SemperWhy

You may want to double-check your estimate of the population of the United States of America.

• Anon

Compare with e.g. Germany, France, Italy, Japan, Spain, Ireland, UK, … with an average 0 mass shootings per year. The fact that the shootings occur with what looks like a random pattern doesn’t mean that they cannot be prevented or reduced. Tell the parent of a victim that we shouldn’t do anything because they’re just two a year on average.

• http://www.facebook.com/people/Petra-Thompson/100000036809170 Petra Thompson

I don’t think it is at all helpful to bring in comparisons with countries with zero mass shootings. After all, these figures show that there years when the US has zero mass shootings, and populations of countries like the UK and France are 20% or less than that of the US, therefore, you’d expect on population alone that there woud be 5x more years with zero mass shootings than the US.

The UK has had such mass shootings: Hungerford (1987), Dunblane (1996), Cumbria (2010)
http://en.wikipedia.org/wiki/Hungerford_massacre
http://en.wikipedia.org/wiki/Dunblane_school_massacre
http://en.wikipedia.org/wiki/Cumbria_shootings

Finland has a population of 5mllion, and before 2007 mass shootings were very rare. There was one in 2007 and in 2008 and 2009, and since then they’ve reverted to years with zero mass shootings. http://en.wikipedia.org/wiki/Jokela_school_shooting
http://en.wikipedia.org/wiki/Kauhajoki_school_shooting
http://en.wikipedia.org/wiki/Sello_mall_shooting

Australia has twice as many handguns per capita than the UK, but the UK has more people murdered using handguns.

There are far too many variants to go making simple comparisons. But you are quite simply wrong when you say that the UK has had zero cases of mass shootings.

If the US has a mean of 2 shootings per year, and the population of the US is 5x larger, then we would expect the UK to only have 2 shootings every 5 years. Given that the access to legal firearms in Britain is so restricted and rare, if anything the frequence of occurence of mass shootings in Britain is more surprising than in the US.

• http://twitter.com/clockworkelves jj

How about a correlation with mass shootings and so-called “gun-free zones”? Wouldn’t mass shootings more likely occur within areas where a criminal can shoot with impunity?

• Done Already

That study has already been done. Every single mass shooting (as defined as a shooting with 4 or more fatalities) has taken place in a gun free zone with the exception of the shooting of Congresswoman Giffords.

• lou

The Fort Hood shooting in Texas was also in an area filled with heavily armed people but no one ever shot that perpetrator.

• Ken Weinert

Aside from the MPs (akin to police in the non-militaryworld) your normal person on a military base is not walking around armed. It’s really no different than the civilian world.

• lou

The point made above was that almost all shootings are happening in gun free zones where it is not permitted for even licensed people to carry their guns (other than police). That seems to be untrue of fort hood.

• LaurenOrder26

The Fort Hood shooter WAS in an area where firearms were strictly prohibited.

• Aaron

Incorrect. Concealed carry is banned on all military installations (Art. 134, Uniform Code of Military Justice.) Permits are not recognized. All Soldiers leave their weapons locked in the arms room of their respective unit.

The Fort Hood shootings took place in one of the most tightly regulated firearms environments in the country, in short.

13+ years, United States Army.

• dynamix

maybe we can solve the mystery through statistics

• fortythree

“…if the incidences of mass shootings in the US match a Poisson distribution, then this argues that the streaks (years with unusually high number of shootings) are expected due to chance. ”

Sir/Madam, Corelation != Causality

So, in the latest unfortunate “random” event, the killer was a young, mentally (very) disturbed man, with access to guns capable of causing a lot of destruction, was able to get through the school security system (not a market, bus stop, airport), but a school. yeah, too many “random” variables. To me, the real randomness was about “when” he would have carried out this, not the “if”. Saying that frequency of earthquakes in SF being random is one thing and saying that earthquakes are distributed randomly across united states..

And what was the sample size again? 60? hmmm.

• http://www.empiricalzeal.com Aatish

I am not making a causal claim here. I am saying that a single year with 7 mass shootings is consistent (although not exceedingly likely) with the model of random, independent events with an average value of 2 per year (i.e. 62 shootings in 31 years). Google for it and you’ll see that poisson processes are regularly used to model earthquakes as well.

• fortythree

Thank you for clarifying – I must admit I used more emotion than logic. With this sample size, why would you even try to find the statistical distribution? To me the choice of a year as the time window is rather questionable. How about the day of the week? That would be roughly similar number of chunks. What if 5 years from now there are 40 such events in that year. Even that wouldn’t deviate from random distribution, and you will still have a statistically insignificant number. Insignificant is the information weather it is poisson, gaussian or exponential. What are the inputs deciding the outcome of these events — people’s motivation to carry these out, their’s opportunities to commit these, other people’s efforts in stopping these from happening, and their opportunities. Here we are discussing opposing forces. Have these been measured? Have you taken care of Simpson’s paradox?

I must however, thank your article and discussion, I learnt more statistics today than any one day before. Peace.

• http://rizzn.com Mark ‘Rizzn’ Hopkins

Correlation != Causality .. methinks you don’t know what that means.

• deturing

There seems to be a common misconception among these comments. People are somehow thinking that “random” is synonymous with “acceptable” or “cannot be helped” (and consequently getting worked up ). That is of course not the case.
But as you point out, I think this indicates precisely that mass murders are not something that should be the sole focus of our discussions, since they are extreme events. We largely ignore the daily deaths due to legal and illegal weapons that occur in the US every day (follow @GunDeaths to find out). Perhaps focusing efforts of reducing the no. of deaths due to guns should be a primary aim?

• Nyrulez

Aatish, it is sad you have arrived at this conclusion from your statistics exercise. I work in this field, and resembling poisson distribtution means nothing. What really matters is the mean. a mean of 100 could still be poisson distributed. Would you find 100 shootings on average acceptable? what really matters is the mean here. The question you are trying to ask is how predictable are they.

You should update the post with correct conclusions.

• http://www.empiricalzeal.com Aatish

I am definitely not suggesting that is is acceptable. All I am saying is that it the data is somewhat consistent with independent events occurring at a fixed average rate of 2 per year (i.e. the observed 62 mass shootings in 31 years).

• Elin

People seem to be missing the idea that the poisson is based on a mean. Of course the mean in other countries is lower or higher than the US and of course public policy in principle could make the mean change. But the poisson in this analysis is about what is happening within the US, is there internal evidence that the mean rising or falling. Since the poisson provides good fit over a 30 year period that’s a good indication that the mean is not changing over that period. That was the question, is the frequency increasing? The null hypothesis–the simplest possible model– has to be no. The data do not disprove the null hypothesis. Of course it could be increasing by a small amount from say 2.0 to 2.1 and that difference would not be detectable with this level of data and given that the base model is already predicting slightly more by chance than what we have observed empirically it’s very unlikely to be detectable at this point). On the other hand maybe the over prediction means that the numbers are actually falling. However the simplest model, and always the preferred one, is that there has been no change. You would have to look at it over time to see if there was any evidence of increase (or decrease) i.e. does the second 10 years of data look different than the first 10. Another thing would be to model it on states and or regions and see if there are differences there.

Also we should not confuse the number of events with the number of deaths total or per event, that too could be increasing.

Just because something is random does not mean you can’t do something about it.

• Fabrizio

yes, also because you still have to fill the 4-per-year and 5-per-year slots (and there is space in the 2-per-year slot) without altering the average.
What about 2013?
If I understand correctly, each one of the three options here above is (unfortunately) more likely than 0-per-year.
Plesse do something. We can still save at least 44 kids.

• Devonavar

Apologies if some of this has been covered. I didn’t read any other responses … but I wrote this for a repost on Facebook. Put some time into it, so I thought I would put it here:

“it assumes that your events are rare, have a fixed average rate, and are independent”

If we are assuming a fixed average rate, doesn’t that cause a problem if you are trying to determine whether the rate of violence has increased?

the author was testing the “fixed average rate” assumption. If the measured rate were far from being a fixed average, the data would not have fit the Poisson distribution as well as they did.

uh … that’s not how this works. Poisson’s theory is true *if* the
premises are true. If the premises are not true, applying it simply
gives you nonsense. You cannot test your assumptions by using a theory
that relies on them. If you want to test your assumptions, you need a
different test.

Let’s
assume that the premises are true. Poisson’s test determines whether
or not a given distribution is random. As it happens, the distribution
we have matches his curve. Which suggests that the incidence is random.
But, if it did not match the curve, that would simply mean that the
distribution was not random. In other words, some other factor would be
in play. That factor need not be one of the starting assumptions. In
fact, it would explicitly *not* be one of the assumptions.

As
it happens, I went back and looked at the raw data. I divided it into
larger chunks (yay proper sample sizes) and looked for a trend.

Here’s how the data broke down. There are some missing years where I dropped them to keep the number of years equal.

First I split the data into halves:

’82-’96: 21
’98-’12: 39

Almost twice as many incidences in the latter half of the period than the earlier. But, two points is not a curve. Quarters:

’82-’88: 6
’90-’96: 13
’98-’04: 12
’06-’12: 25

Not
a smooth curve, but there’s definitely a trend upwards. And, there’s
no zero points to indicate a sample size problem. Eighths (this one is
missing seven data points, but the average of the missing data is 2
(same as the overall average), so there shouldn’t be much bias):

’82-’84: 3
’86-’88: 3
’90-’92: 6
’94-’96: 3
’98-’00: 9
’02-’04: 2
’06-’08: 10
’10-’12: 11

There’s
a lot of random noise at this sample size, but you can still see a
clear upward trend. Given that we have sample size issues at 3-year
intervals, I’d say the 1-year sample size used in the original article
is certainly too small to draw accurate conclusions.

One
more attempt: Sixths. I’m trying to strike a happy medium between
having a big enough sample size, and having enough data points. As a
bonus, I only have to drop one year: ’97, which had 2 events.

’82-’86: 4
’87-’91: 8
’92-’96: 9
’98-’02: 10
’03-’07: 11
’08-’12: 18

Now
there’s a trend you can draw a line through. You can even drop the low
and high points as anomolies and still have enough data points left to
show a slightly upward trend. I doubt a change of 1 is statistically
significant though. Oh well.

Now.
I’ll be the first to say that there isn’t enough data to draw firm
conclusions. And, really, I think that was the point of the original
article. There simply isn’t enough data to draw a valid statistical
conclusion.

My
complaint isn’t with the conclusion. My complaint is how it was
reached. If you think there is something to measure here, you need to
grant that there appears to be an upward trend to the data, which
violates one of Poisson’s assumptions, making it an invalid analysis for
the data. If you don’t think there is anything to measure here, you
don’t need any further analysis: This data is random, move along,
there’s nothing to see here. Poisson’s analysis is only useful if
randomness is actually in question. If you feed it data that you
already know is random, of course you get a random result.

Statistically,
I can do a far simpler analysis than the article and reach the same
conclusion. If you are going to do statistics, dropping the high and
low data points as anomalies is standard practise. 2012 is clearly an
outlier in our data set, therefore it should not be used to indicate a
trend. It is an anomaly before any actual stats gets done.

So,
to sum up good data, lots of rational words, but piss-poor analysis. This is not good science. It just *sounds* like
good science because it reached the right conclusion.

• Devonavar

Crap. It thought my angle brackets were HTML. Sorry for the random formatting. I just wanted to separate out the different Facebook posts.

• Devonavar

PS: There appears to be a minor data transcription error in the original article: I counted a distribution of 3 years with 4 shootings, and 1 year with 5 shootings (1999). This fits the Poisson curve even better…

• http://www.empiricalzeal.com Aatish

I’ll look into this – thanks.

• http://twitter.com/JeffreyGuterman Jeffrey Guterman PhD

Time will tell if the recent up-tick in mass shootings are random or reflective of a significant trend. For now, I am concerned about the seeming overemphasis that many people and institutions have placed, especially some in the media, on mass shootings. An FBI study (cited in USA Today, http://www.usatoday.com/story/news/nation/2012/12/18/mass-killings-common/1778303) found that mass killings (i.e., four or more people killed in one incident) occur in the United States about every two weeks, yet the majority of these incidents receive little or no attention from the media. This study showed that despite the frequency of these mass killings, they comprise only 1% of the approximately 15,000 annual homicides in the United States. The Newtown shooting struck a nerve throughout the nation, and for good reasons. I am concerned, however, that we should not let our emotions distract us from the bigger picture about violence, namely, that overall, homicides have been steadily decreasing in the United States and there is much to be done in addition to–and beyond–reforming gun laws and militarizing our schools. A more solution-focused approach to the problem of violence might involve investigating precisely why homicide rates have dropped in recent years, identifying positive factors that have contributed to less killing in our society, and building on such progress. – http://JeffreyGuterman.com

• Anonymous, Ph.D.

As a statistician, IMO, this will only imply the marginal distribution is Poisson, not the joint. I did my dissertation on Poisson point processes… While a test as such is unavailable there are certainly diagnostic forms available.

Personally, I’m too lazy to run the analysis and if I were to do it I would have to take a bit of time to make sure it’s reasonably correct and up to snuff, as it were. That and my employment situation means I want to stay away from anything controversial.

That being said, we are talking about time clustering which maybe inadequate to measure at a yearly resolution. If we treat this a Poisson process w the assumption of a baseline intensity rate (I.e. events over any equally spaced window is equally likely) then I think that would yield a more revealing, for better or for worse, result. After all the interest is in inter-arrival times.

To everybody els, I apologize for the advanced geekery… But this is no different than an engineer explaining behavior in his field.

• Dustin

I wanted to thank you for this analysis. I actually did some analysis of my own to try and understand what an assault weapons ban would do for gun-related homicides in the United States. From my analysis, it looks like an assault weapon is only needed for 0.78% of all gun-related homicides in the United States. Unless you want to ban all guns, and completely rid the streets of all guns (think about the war on drugs before you go down this path!), it’s not going to happen… Period.

Here’s a link to my article, which I link to yours:

http://www.dustindevries.com/content/politics/banning-assault-weapons-does-virtually-nothing/

• channelclemente

You might incorporate the data in the mother Jones article into you lookback. They look at wounded as well as killed over a longer duration. Interesting article, bigger N.. Why not explore more mean incident per interval choices.

http://www.motherjones.com/politics/2012/12/nra-mass-shootings-myth

• Yjikes

I am sad to say that, if you take in to account all the countrys in the western hemisphere, the mass shootings stop being random at all.
A cluster appears around USA, making the likelihood of an incident skyrocket, compared to other countrys.