The Fifty States of Nate

Dear readers,

Today’s post deals with psephology, the study of elections, and more specifically, the application of mathematics and statistics to it. This field has gained notoriety over the past few years in the United States, largely thanks to the spectacularly successful (and often attacked) predictions by Nate Silver in his blog. While I am not a psephology expert by any means, I understand enough about Nate Silver’s models, and mathematical models in general, to hopefully provide you with an interesting perspective.

The first point I want to make is that, although Nate Silver did predict the outcome of the 2012 United States elections correctly in all 50 states, this in itself is neither his most significant achievement nor an extremely impressive one. His most significant achievement is to highlight the principled application of mathematics and statistics to an area that has up to now been subject to large amounts of human bias. In this, he is far from being alone; however, thanks to his blog’s association with the New York Times, he is the most visible. I give him credit for drawing attention to the power of mathematics and statistics.

Nate Silver is also far from providing the most accurate predictions of the outcome of the last elections. In a detailed analysis, the Center for Applied Rationality actually shows that predictions made by Drew Linzer and Sam Wang were somewhat more accurate when not only the prediction, but also the fraction voting for each party, are taken into account. As for whether predicting 50 outcomes correctly is in fact impressive, consider that there was considerable uncertainty in only 9 of the 50 states. This means that even a completely random prediction made by 9 coin flips had a roughly 0.2% chance of being perfect. Of course, this probability goes up tremendously if good data is available from election polls.

Furthermore, while mathematical models are great for bringing together data from different sources (in Nate’s case, poll data, as well as some understanding of how voter preferences change over time), they are also vulnerable to the biases of this input data. In particular, if the data sources themselves are noisy or inaccurate, even a good mathematical model may go astray. This phenomenon, known as garbage in, garbage out, was partly responsible for Nate Silver’s rather poor prediction of the outcome of the 2010 UK elections. Indeed, poll data in the UK is not disaggregated by region as it is in the US, so that more uncertainty about the regional variation was necessarily present which appears to have thrown Nate off.

About a month before the 2012 US election, Nate Silver himself pointed out, “I’m sure that I have a lot riding on the outcome […]. I’m also sure I’ll get too much credit if the prediction is right and too much blame if it is wrong.” This brings me to my next point. Although mathematical models are frequently used to predict reality, this is neither their only nor even their main use. It may seem like a good idea to evaluate models by how well they predict relaity, but their real value is in helping us understand which factors (or variables) are important and which are less important. Their main role is to provide insights into complex phenomena, whether they are climate change, etiology of diseases, or election outcomes.

Unfortunately, Nate Silver seems to miss the opportunity to allow his models to provide these insights by keeping them private. This is the main concern I have about his work. Perhaps the academic idealist in me wants to be able to reproduce the results of any modeling effort, if only in principle, and Nate may have valid commercial or ethical reasons for not disclosing his models. Still, it would be a good idea for the sake of transparency to allow his fellow psephologists to look “under the hood”, just like the aforementioned Drew Linzer, Sam Wang, and many others do. I strongly believe that transparency and open sharing is critical for advancing the field, as well as preventing the reliance on “tweaks” that may temporarily improve model performance, but are detrimental in the long run.

In any case, I just added Nate Silver’s recent book to my reading list. It seems that the book tackles a wide variety of applications of mathematical modeling (much like I’m planning to do in this blog). I’ll be sure to read it over the New Year break and report on the experience, and I welcome your comments if you’ve already had a chance to read it, or simply want to comment on another issue discussed here.