Understanding the basics of polling is a challenge. Kathleen Woodruff Wickham, Assistant Professor/Journalism at The University of Mississippi, Oxford, Miss. explains basic statistical techniques. This exercise will help you understand surveys and how to report them in the paper.

Back to Numeracy Exercises Back to Surveys

Please don’t eat the data: M & Ms and polls

Understanding the basics of polling is a challenge. Just because a candidate appears to be ahead based on the percentages, the fact is he/she might not be ahead when the figures are examined using basic statistical techniques.

For example, someone not familiar with statistical analysis might report:

Candidate Bob Cressman is ahead in the polls following a debate with Candidate Denise Westbury based on a poll done by Voters Choice that found 53 percent of those polled favored Cressman compared to 47 percent favoring Westbury.

However, an important fact, the margin of error, is often overlooked in reporting poll results. If the margin of error is, for example, plus or minus 4 percent, it means that there is no difference in the two candidates' ranking in this example (47 + 4 = 51 vs 53 - 4 = 49). In this case. the two percentages could possibly overlap depending on error, creating the potential for no difference in results.

What do we mean by error? The margin of error refers to a statistical tool used to balance against "error" in sampling (Example--too many men, too many middle age folks), questions (bias, lack of clarity) etc. -- things that are unseen that might affect the actual outcome. This is the simple explanation. There are multiple types of error. The bottom line is that no survey or poll is perfect. The statisticians developed margin of error to compensate for this.

The margin of error is based on the sample size -- the more people polled the smaller the possible error. Statisticians have generated charts showing the margin of error based on sample sizes. It is not an arbitrary figure.

To demonstrate margin of error try this exercise:

  1. Using bags of M & Ms (or other colored candy) assign each person a different color to count. Assign each candidate/issue a color (s). Advise group members not to eat the data and to include the cracked M&Ms in the count. (Eating the data is unethical.)
    Or, as an alternative give one group two bags, another group three bags and a third group four bags so that the size of the sample varies and thus, the margin of error will vary.
  2. Consult a statistician's manual/textbook to determine the margin of error for your sample (total number of M&Ms counted by everybody).
    If using the alternative exercise, compare the margin of error for the different groups/polls. Compare the differences.
  3. Find out what percentage of the whole each color is (in my class exercise we arbitrarily assigned red and brown to Cressman, blue and yellow to Westbury and green to the third party candidate—Dennis Fletcher).

    For our examples: Assume all the M&Ms in three bags were counted and the total came to 2,642 M&Ms.

    Cressman had 1, 360 "votes" or 51.4 percent of the votes

    Westbury had 930 "votes" or 35.2 percent of the votes

    Fletcher had 352 "votes" or 13.3 percent of the votes

  4. The margin of error chart indicates that with a sample of 2,642 the margin of error is plus or minus .44 percent at the 99 percent confidence level and 1.4 percent at the 95 percent confidence level. (I'll explain confidence level in a moment).

    For this example, choose the 95 percent confidence level figure.

    To determine the true difference in the percentages subtract the margin of error from the largest number and compare it with the highest possible of the lower two numbers.

    51.4 - 1.4 = 50

    Add the margin of error to the lower figure(s)

    35.2 + 1.4 = 36.6

    13.3 + 1.4 = 14.7

    At the 95 percent confidence level it is obvious Cressman is ahead.

  5. We can write that:

    In the Voters Choice poll conducted after the debate candidate Bob Cressman 52 percent of those polled supported Cressman and 37 percent supported candidate Denis Westberry. The poll had a plus or minus margin of error of 1.4 percent and was evaluated at the 95 percent confidence level.

The confidence level is a bit more complicated to understand, but goes to the heart of accurate reporting.

Confidence level: what’s that?

Another issue in reporting polls is the confidence level, which is a bit more complicated to understand, but goes to the heart of accurate reporting.

When conducting polls researchers, in advance, assign a number called a confidence level--this is the level (percent) at which they have confidence in the results. By definition confidence level is viewed as the probability of obtaining a given result by chance. Researchers select the confidence level in advance based on pre-testing and previous research. The purpose is to prevent the researcher from being tempted to tweak the results for a more favorable result.

The confidence level is usually reported at the 95 percent or 90 percent level, but can go higher or lower (think of it this way--if you were taking a new drug would you want one tested at the 98 percent confidence level or the 75 percent confidence level?).

The confidence level figure is part of the same chart as the margin of error chart. The confidence level should always be reported as part of the story because it gives the reader a chance to assess the results.

Final notes:

Just because someone does a survey does not mean it has validity. Always consider, and report:

  1. Who sponsored the survey: organizations seeking to promote their point of view may report slanted results based on their views. The credibility of the polling organization goes to the heart of interpreting the accuracy of the results. Does the sponsoring organization have a vested interest in the outcome? Is the organization experienced in polling? What were the questions? Are the results pulled from the larger sample with no change in the margin of error?
  2. The size of the sample: the larger the sample and the more random the polling the more accurate the results. The sample should represent the population being queried. Consider the impact of varying demographics, limiting questions to only those people with listed phone numbers and making contact during limited hours of the day. These factors could impact the outcome. If a telephone survey is conducted and only those folks with listed numbers are called the poll is not random.
  3. The method used to collect the data: random or self-selected. Random samples are random in that every person in a group has an equal chance of selection. The results of random samples can be applied to a larger group because every person in the study group had an equal opportunity to be selected. Self-selecting polls only report the results of people who cared one way or another to either call a particular number, reply by email or are willing to be interviewed in person in a public place (think of mall surveys). The results cannot be generalized beyond those who choose to answer.
  4. The actual questions: wording can impact the results. Are the questions clear with only one topic included in each question? Were the questions balanced with both sides represented? Were the questions easy to understand? Can you detect any bias in the questions that could impact the results? Were questions open-ended or multiple-choice? Open-ended questions open the door to bias in interpretation. Multiple coders are needed to decrease the opportunities for bias—and should be reported in the report’s methodology section. Sometimes organizations will combine the answers from questions to be able to report a more favorable result.
  5. Your own review of the numbers: get the raw data, if you can. If not, look over any charts or supplementary information provided by the poll’s sponsor. Do the math yourself. Determine how the poll’s sponsors came up with the figures. Their numbers might not be accurate. Did those reporting the results pull one group out from the whole? That changes the margin of error and the random sample parameters.
Back to Numeracy Exercises Back to Surveys