Here’s How to Fake Evidence of Voter Fraud

Somewhere, in a cubicle in Washington, a data analyst is panicking. She has just been asked by the Trump administration to show how 3 million people (or, preferably, more) voted illegally. Deep down, she knows that this is a ridiculous request. But she’s a team player.

She will first try to identify specific cases of clear voter fraud. The goal will be to collate these clear cases into a list of names and addresses. A list with three million entries. How hard could it be?

She’ll start with the low-hanging fruit. She’ll cross-reference voting lists (not registration lists) to the National Death Index. She needs to look at real voting lists since dead people may still be on the registration lists without actually voting. She’ll find a few matches but, unfortunately, they will all prove to be false positives. People with the same name, people who didn’t really die, people who didn’t actually vote.

She’ll then try to figure out if illegal immigrants voted by cross-referencing voter lists with e-verify (, the government’s electronic employment verification tool. She’ll get a few hits, but, again, they will all prove to be false positives. People with the same name, people who actually are citizens but aren’t in the system, and so on.

And that’s where the investigation will end, right? Wrong. Given what we have seen in the first seven days of the presidency, President Trump has shown us that he is never wrong, and must prove it despite incontrovertible evidence to the contrary. But there are still a few ways to find 3 million illegal votes. That they all depend on egregious misapplication of statistics and data manipulation can be spun into obscurity by the Press Secretary and the other dynamos in the Trump team.

Option 1: Use bad survey data.

When White House Press Secretary Sean Spicer was pressed for support of the President’s voter fraud claim, he cited this paper, by Jesse Richman, Gulshan Chattha and David Earnest. They used an internet survey and found that a small fraction of people said they voted, but also clicked that they weren’t citizens. Of 32,800 surveyed in 2008, a total of 339 people labeled themselves as non-citizens. Of those, 38 people also claimed that they voted, about 11%. Since there are 11 million illegal immigrants in this country (according to Pew Research), 11% of 11 million people would yield 1 million fraudulent votes.

Damn, not quite close enough to 3 million. But enough to get you some voter ID laws (and, after all, isn’t that really the point?).

Option 2: Use numerology

The administration has been uniquely willing to support and disseminate conspiracy theories. One could argue that the President’s real foray into politics was his vocal support of the conspiracy theory that Barack Obama was not born in the US. His refrain when confronted with facts: “I’m just asking questions”. Remember that, Mr. President. You’ll be using it a lot.

There are some interesting things that happen with long lists of numbers. Certain digits come up more than others. This phenomenon is known as Benford’s law and it basically says that in any long list of numbers, the first digit will be “1” more often than any other digit. This is not an alt-fact, this is a true fact.

If humans are going to misreport voting numbers, they tend to do it in biased ways. No human, planning to subvert the democratic process, would falsely report votes in his or her district using nice round numbers. That’s too obvious. They would make up a number that seems real, something like 1,357. The important thing here is the digits. If we look at the second digit in vote tallies, so the theory goes, we may find evidence of election fraud because they won’t be distributed according to a set mathematical formula.

If you look at enough states, you are guaranteed to find a place where the second digits of vote totals seem off.

In fact, I started with Connecticut to help you out.

I took 682 polling stations in Connecticut and looked at the second digit of the vote tally reported. Here’s what I found:

That second digit seems to be spread pretty evenly across all digits from 0 to 9. But we can still look for possible conspiracies.

Some cool math tells us that the average value of the second digit in a long list of numbers should be 4.187. The average of the second digit of the 682 polling place tallies in Connecticut is 4.38. That is not statistically different than 4.187. Okay, fine. Let’s look at the second digit of only the Clinton totals: average of 4.04. Still not statistically different than 4.187.

How about the Trump totals? The average of the second digit for his tallies is 3.77. That’s statistically different. This would only happen by chance alone 1 in 1000 times.

Now take that number and run with it. Can it tell you that voter fraud favored Clinton? No. But who cares. You’re just asking questions, remember? One in 1000! At the very least, this data should allow you to pass some voter ID laws (and, after all, isn’t that really the point)?

Option 3: Use Modeling

If it’s good enough for Nate Silver, it should be good enough for us right?

Models are fun. We can build them using statistical software to make predictions. Sometimes they work really well. Sometimes they don’t. But as a means to an end, they are perfect.

First, we’ll make a model to predict how many votes Trump should have received in each polling place.

I took voter registration data from each of my 682 Connecticut polling places, and tried to predict the number of Trump votes. As you might expect, the more Republicans there were, the more votes Trump got. He also did well with “unaffiliated” voters. Here’s a graph:

Not too shabby. Our model is pretty good at predicting Trump votes. But there is a dot that seems out of place (I circled it in red for you). An outlier. Or, in alt-speak: a highly suspect polling station. Trump got WAY less votes here than we expected. About 1500 “missing” votes, you might say.

We can actually measure how far off this polling is from the expected value statistically, and report just how unlikely it is.

We predicted about 2500 votes for Trump from that polling place, but only got a bit above 1000. How unlikely was that? Get this — it’s a 1 in 3.6 million chance. This is pure propaganda gold, a baby-blue boxed gift to Kellyanne Conway. By the way, I would mention which polling place this was if I wasn’t worried that someone would show up there to do some vigilante-style investigation.

Clear evidence of voter fraud, and in Connecticut of all places. Hillary central. Case closed. Extrapolate this to the rest of the country, and boom — 3 Million votes, no problem. At the very least, this data should allow you to pass some voter ID laws (and, after all, isn’t that really the point)?

Modeling can be used for more than predicting expected votes, though. A particularly nefarious strategy would be to use a model to predict whether or not someone is an illegal immigrant (based on factors like ethnicity, duration of residence, zip code, etc). Models always have false positives. Worse models have more false positives. Using such a model you could show not that illegal immigrants voted, but that people who seem like illegal immigrants voted. But those little details will be left out of the discussion.

Remember this when the results of the “investigation” are in. There will be no evidence of voter fraud.

What there will be are minor statistical aberrations combined with bad methods, which will lead to “serious questions about the safety of our voting system”. Don’t believe it when they say it. And don’t allow these tricks to disenfranchise real American Citizens via draconian voter ID laws.

There will be no real proof of voter fraud, there will only be “questions”. There will be no names, only numbers. There will be no evidence, only conspiracy theories.

Writing about medicine, science, statistics, and the abuses thereof. Commentator at Medscape. Associate Professor of Medicine at Yale University.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store