So it is election time again, and by and large I ignore politics. I just try to stay out of it, not that I don’t care, but because I don’t want to be seen as being overtly campaigning for one person or another. For context, I live in Utah’s 4th congressional district. My current representative is Mia Love. It turns out that this election cycle has not been good for her. The race is incredibly close, at least that’s what the news is saying. Let’s try to figure out, just how close this race is going to be.
The latest poll that I could find (as of writing) was this one done by the New York Times and Sienna College Poll. It says that Mia Love is at a 2 point deficit to Ben McAdams. We’ll make some simplifying assumptions, which probably means that you shouldn’t take the rest of this analysis too seriously, because it is umm garbage for fun data scientist stuff, and not real political analysis. Anyway, the assumption is that all of the undecided voters split evenly between Mia and Ben. And that the difference between Mia and Ben is going to be normally distributed, I argue that there are better methods but hey, let’s just simplify a little bit.
So given the polling data you may be saying that it looks a little grim for Mia, but there is a 5-point margin of error. Now I don’t know about you but that is awfully hard to interpret, but it means that we expect on average the actual election would yield a result where Mia Love could lose by as much as 7 points or win by 3 points or anything in between 95% of the time. Great you say, the race is a toss up, actually, it looks quite a bit grimmer for Mia Love than you realize.
Personally, I hate how election poll data is presented. It gets presented this way because it attracts eye balls, and thus profit for the news organization. I get it but come on, we know how to do better. A close race is way more fun to report on than say a slam dunk win for one side or the other.
So our model is that the truth is a normal curve with mean -0.02 and standard deviation of 0.025, essentially half of the 5 point margin of error. Cool, so what is the probability that the win-loss number is actually greater than zero, indicating a win for Mia Love?
Let’s use some handy dandy python to figure it out. It literally only takes two lines of code.
from scipy.stats import norm print(1 - norm(loc=-0.02,scale=0.025).cdf(0))
What you find if you run this code is that the probability of a Mia Love win is only 21.19% (rounding at two decimal places). Ouch! According to this poll, we should only expect Mia Love to win 1 in 5 times. That doesn’t sound particularly close to me.
The nice thing is that we have a couple of polls to look at. In the next blog post, I’m going to use the same methodology to generate probabilities for a Mia Love win for each one and then use Bayesian Updating to figure out our best guess for seeing Mia survive this election cycle.