Even politically sophisticated journalists sometimes have a hard time understanding how a couple of thousand individuals can accurately represent the entire U.S. population. But they can, so long as the pollster draws a good sample. Jack Hart, Managing Editor of the Oregonian, explains.

More on polling & surveys

Reporting the Margin of Error

Big stories on polls and surveys should always report the confidence level and the margin of error. Those two figures give readers what they need to determine whether two numbers are significantly different. If a poll determines that 49 percent of voters favor Candidate A and 51 favor candidate B, for example, how confident are we that Candidate B is actually leading?
We're pretty good about mentioning the confidence interval in poll stories. In fact, more than 500 stories have done so in the past three years. We're much less consistent in explaining the margin of error in a way that makes sense to readers.
A little murky on the concept? If you don't know and don't care -- or even if you do know and sorta care -- here's a foolproof formula for reporting the confidence level and the margin of error:
The chances are 95 out of 100 (the confidence level) that the results are within plus or minus X percentage points (the margin of error) of the true value for the whole population sampled.

Back to Numeracy

How Polls Work

Even politically sophisticated journalists sometimes have a hard time understanding how a couple of thousand individuals can accurately represent the entire U.S. population. But they can, so long as the pollster draws a good sample. The experts have proved their ability to predict voting patterns, within a couple of percentage points, time and time again.

George Gallup likes to explain it with his soup analogy. One spoonful of soup can accurately represent the taste of the whole pot so long as everything is well-stirred.

The stirring's the key. That's what introduces the random element that's so important to scientific sampling. Randomness, in turn, is what brings the laws of probability into play. And the laws of probability are what make accurate polling possible.

Everything depends on what's called the "sampling distribution." The idea is this: If you draw repeated samples out of the same population, they distribute themselves in a normal (bell-shaped) curve around the average for all possible samples. And that average will exactly equal the average for the whole population.

That's what probability theorists call the Central Limit Theorem.

Let's say you're sampling heights from a sample of 100 men who, on average, are 6 feet tall. You take repeated random samples of 10, drawing every one of the hundreds of unique samples possible.

You'll get more samples with average heights close to six feet because there are more of those men and you're more likely to select them. You'll get damned few samples that average 5 or 7 feet because you're unlikely to get many extremely short or tall men in any given sample. So the total "sampling distribution" will form a normal curve that falls away on both sides of the point representing the average of all samples, which is exactly 6 feet. Like this:

The magic of this is that the average for the sampling distribution -- 6 feet -- is exactly the same as the average for the whole population. Always.

Of course, you don't actually draw the entire sampling distribution. The beauty of this thing is that you don't have to.

Every sampling distribution forms a normal curve. And for every normal curve, you can calculate the "standard deviation," which describes the shape of the normal curve and is based on the total amount that each individual differs from the average for the whole group. It tells you, in other words, how flat or steep any given normal curve happens to be.

The critical fact of nature is that in every normal curve about two-thirds of the things being measured will be within one standard deviation of the average. About 95 percent will be within two standard deviations.

So every time we draw a random sample, there's a 95 percent chance that it falls within two standard deviations of the average for all samples, which is the same as the average for the whole population.

Voila! We have everything we need to figure out how much faith we should have in our sample.

First of all, we know that we can be 95 percent confident that our results are within plus or minus two standard deviations of the true figures for the population we've sampled. That 95-percent figure is what we call the confidence level.

Second, we can calculate the value of one standard deviation for this sampling distribution. Once we have that, we can calculate the number of percentage points that two standard deviations represent in either direction. That's the margin of error. So we have the two figures -- the confidence level and the margin of error -- that are critical to assessing the value of any poll.

In the case of our example, let's say that we draw one random sample of 10 and that the average man in that sample is 5-foot-10. Let's also say that we calculate the standard deviation in heights for a sampling distribution of 10-man samples drawn from a 100-man population. Let's say it happens to be 2 inches. That means that 95 percent of the possible samples will average between 5 feet 8 inches tall and 6 feet 4 inches tall. So we can say that the chances are 95 out of a hundred that the average height in our sample -- 5-foot-10 -- is within plus or minus 4 inches of the true average height for the whole population.

That doesn't mean we won't get an occasional wild sample that's way off the mark. We will, 5 percent of the time. And it doesn't mean that other sources of error won't bias our results. We could have done a bad job measuring, for example.

But we can be absolutely certain of the confidence level and the margin or error. Those result from the Central Limit Theorem. And they're just as reliable as the Theory of Relativity, the formula for pi or any other law of the universe.

And that's why George Gallup has been able to call every presidential election within 3 percentage points since 1952.

Back to Numeracy