Wednesday, April 5, 2006

How the Surveys Have Lost Their Sting

or, PUBLIC OPINION POLLING HAS BECOME A GENRE OF JOURNALISM
ver the years, public opinion polling in the Philippines has acquired a generally good reputation. More than any other, the Social Weather Stations has earned the trust of the public, with Pulse Asia Inc. trying very hard to give Hertz a worthy Avis at No. 2. Both have firmly established scientific public opinion polling in the Philippines as surely as Gallup once did in the United States. But that reputation and trust are like the reputation and trust that people accord to newspapers, tv, radio and journalism in general. And not all polls are created equal. Like the mass media itself, there are also "tabloid" polling practitioners -- like every absolutely meaningless radio and TV call-in poll, which are really "For the Entertainment of Fools only."

SURVEYS HAVE LOST THEIR STING Thus, the audience for surveys has matured as people have developed a healthy skepticism towards the surveys, while giving the same benefit of the doubt that they give to other journalists that report, analyze and interpret the vast and ever-changing entities called Current Events and the Public Opinion. Another survey? Hohum, most people say. This skepticism is well-founded, not because the pollsters are dishonest or their results inaccurate, but because people sense that the limitations of their craft. From a mere 1200 respondents, they are expected to divine great generalities about 42 million adult citizens. Plus, people know the logistics of conducting a survey are such that they are necessarily just snapshots of perhaps very fast moving events and changing situations. Sometimes their questions seem "biased" or "loaded" and people have seen this. Surveys have lost their sting. The power of the surveys have faded as people have become jaded.

POLLSTERS ARE PART OF THE MASS MEDIA People have learned that just because a public opinion pollster uses statistical and scientific techniques in the collection and analysis of its raw data, does not mean that its product is pure scientific information. People detect a familiar behavior on the part of pollsters that they find in ordinary journalists -- they editorialize on their own data. The Social Weather Stations reports are not given the same scientific value as, for example meteorological weather reports from Pag-asa or vulcanological bulletins from Phivolcs. One can see this difference clearly in the fact that we all seem to find ourselves "agreeing" or "disagreeing" with the published results of public opinion polls in a manner that never occurs when we hear the morning weather forecast or view a satellite image of the archipelago. Public opinion pollsters are now regarded by the public as closer cousins to newspapers and other forms of news and opinion journalism than to physical weather monitoring agencies. Public Opinion Polling Is A Genre of Journalism.

IMPORTANCE OF SURVEY AND QUESTION DESIGN: Because every public opinion survey is based on questions designed by the pollster, that pollster gets to analyse, report upon and interpret the results of the survey for the mass media and the general public. There is in other words, an editorializing function that pollsters also play. Just as newspaper editorials often interpret news and even other opinion, So too do public opinion pollsters make comments on their own data. Sometimes, I have noticed, the Media Release wrappers of both Pulse Asia and SWS do not accurately reflect what their own survey data actually says, when strictly and logically interpreted because the design of the survey, especially the number, manner, sequence and mix of questions that are asked has everything to do with the interpretation by the pollster and the subsequent reportage. This matter of the subsequent reportage also represents a very important effect. Few journalists and media people in the Philippines are qualified to properly interpret the statistical data collected in a survey, nor do many of them seem inclined to learn the mathematical and scientific basics, as evidenced by the truly ignorant reporting that goes on about what a survey says, sometimes from the Pollster itself!

2006 SWS NATIONAL SURVEY RESULTS AND SPIN Take for example the headline at the website of the Social Weather Stations today: First Quarter 2006 Social Weather Survey: Options For Toppling GMA: Coup Gets Split Opinions, People Power Gets 48%, Pro-Resign Gets 44%

SWS asked 1200 randomly selected adult registered voters from all over the Philippines whether they AGREED, DISAGREED or were UNDECIDED about the following statements;

(1) If President Arroyo resigns, it will be good for the country.

(2) It is good for the country if PGMA will be removed by a People Power.

(3) It is good for the country if PGMA will be removed by means of a military coup.

The results are summarized in TABLE 1 from the SWS website which i've copied below and also added a TOTAL column.
SWS 2006 March
It would be good for the country if...
Agree UNDECIDED Disagree TOTAL
GMA Resigns 44 29 23 96
GMA is removed by a People Power 48 21 27 96
GMA is removed by a Military Coup 36 23 35 94
So why don't the numbers add up to 100% along each row? You have to read some very fine print on Table 1 and each of the accompanying Tables on the SWS website to find out that ... "Don't Know and Refused Responses are not shown."

As a scientist, I don't like throwing away good data or excluding them from the final report on a scientific investigation. I don't know how the missing 4%, 4% and 6% respectively break down between Don't Know and Refused Response, and in fact, with Undecided, these categories seem to belong to one called NEITHER, which produces the following more accurate reporting of SWS's own data...

SWS 2006 March
It would be good for the country if...
Agree NEITHER Disagree Total
GMA Resigns 44% 33%
23% 100%
GMA is removed by a People Power 48% 25% 27% 100%
GMA is removed by a Military Coup 36% 29% 35% 100%
Now look again at the headline and article of SWS announcing these results at their website:
First Quarter 2006 Social Weather Survey: Options For Toppling GMA: Coup Gets Split Opinions, People Power Gets 48%, Pro-Resign Gets 44%

Why did the SWS headline mention the 48% and 44% under "Resign Is Good" and "People Power Is Good" but not the 36% that "Military Coup Is Good" got?

I'm not sure if there was a sinister or technical reason. But as it stands, the headline GIVES THE IMPRESSION that (48+44) or 92% are actually for removing GMA by EITHER People Power or Resignation. That impression would not be given if the headline had read: "People Power 48%, Pro-Resign 44%, Military Coup 36%" because it would become obvious that SWS did not present mutually exclusive choices to the respondents since the questions they asked were in the form "Do you agree or disagree that it would be good for the country if GMA resigned" The fact of the matter is, SOME people who agree that "it would be good" if GMA resigned, also agreed that "it would be good" if GMA were ousted by People Power or a Military Coup. I suspect that a core of common respondents are in the AGREE and DISAGREE columns of all three questions.

WHEN THE UNDECIDEDS HAVE IT Hardly emphasized in either the SWS reporting or subsequent media reporting is the glaringly large percentage of the UNDECIDED or NEITHER as I've already pointed out. Whenever I run into a survey that has UNDECIDED percentages even larger than AGREE or DISAGREE, there are usually two possibilities:

(1) People don't care either way, or,

(2) Public Opinion on the survey subject is in a period of rapid change.

But just look at those double digit percentages of UNDECIDED in the 2006 March SWS survey. They are huge compared to say the margin of error of plus or minus 3% in these 1200-respondent surveys.

SPEAKING OF MARGIN OF ERROR... You have just stumbled upon one of my favorite topics in all of public opinion polling mathematics. What most people don't appreciate is that the Margin of Error in a scientific survey can be used as a kind of BULLSHIT DETECTOR on the validity and significance of any interpretation or editorializing that might be made of the raw statistical data. This topic is rich with popular misconceptions and vague concepts that a detailed discussion may be useful to Philippine Commentary Readers.

First look at the following "standard disclaimer" which comes from Pulse Asia Inc.'s recent Media Release on Chacha, but which also applies rigorously to SWS--
Based on a multistage probability sample of 1,200 representative adults 18 years old and above, Pulse Asia’s nationwide survey has a plus-or-minus 3 percent error margin at the 95 percent confidence level. Subnational estimates for each of the geographic areas covered in the survey (i.e., Metro Manila, the rest of Luzon, Visayas and Mindanao) have a plus-or-minus 6 percent error margin, also at 95 percent confidence level.
WHERE DO THESE NUMBERS COME FROM? Have you ever asked yourself where these numbers come from and how to use them in reading the surveys? I suspect that most people do not know the answers to this question, yet they are they key to taking a public opinion pollster's data and making sense of them yourself, and not have to rely on the pollster or other media to interpret them for you!

MARGIN OF ERROR DEPENDS ON SQUARE ROOT OF SAMPLE SIZE The first thing to understand is why the "nationwide survey" has a plus or minus 3% error margin while the "subnational estimates for Manila, Luzon, Visayas and Mindanao" have a plus or minus 6% error margin. The answer is this. The nationwide survey takes into account ALL of the 1200 respondents randomly picked from the total population of over 40 million adult registered voters in the Philippines, while each of the "subnational estimates" are from the responses of the smaller component group of 300 respondents each, that are randomly picked from NCR, rest of Luzon, Visayas or Mindanao populations. But why plus or minus 3% and 6% respectively when the subnational samples are one fourth of the size of the total nationwide survey. The reason is mathematical. A simple and famous formula for estimating the statistical error margin says it is equal to 100% divided by the SQUARE ROOT OF THE SAMPLE SIZE. (Note that the "plus or minus" comes from the elementary mathematical fact that every positive number has TWO square roots, one that is greater than zero and another that is less than zero. But for example, negative 34.641 times negative 34.641 equals 1200, just as much as positive 34.641 times itself equals 1200).

WHERE THE 3% COMES FROM Thus for a random sample of 1200 respondents, the error margin would be 100% divided by (plus or minus 34.641) which equals plus or minus 2.89% -- This is where the usual error margin plus or minus 3% comes from, after being rounded up.

Next for a random subnational sample of 300 respondents (4 of which make up the national sample) the error margin would be 100% divided by (plus or minus 17.32) which equals +/-5.77% -- this is where the plus or minus 6% comes from for the national subsamples, after being rounded up.

OF WHAT USE IS THE STATISTICAL ERROR MARGIN? Ask first a different question: Why do SWS and Pulse even bother to ask 1200 people. Why not ask say just 100 people, which would make all that nasty percent arithmetic easy to do in one's head. Well according our formula, if SWS conducted a 100-respondent survey, each of the statistics they measure would have a plus or minus ten percent error margin. If they asked a hundred people whether they agreed it would be good for Gloria to resign, and 44% of them said yes, it would only mean that 95% of the time they conducted such a survey, the results could easily fluctuate between a low of 34% and a high of 54% -- which would really make the survey useless wouldn't it? This is one important use of the error margin -- knowing if the sample size is reasonably large enough for the question being asked and the precision needed in the answer.

NET SATISFACTION RATING HAS TWICE THE STATISTICAL ERROR MARGIN But there is another very important application of the statistical error margin that the pollsters have assiduously avoided explaining or stressing to the public. It has to do with what is called NET SATISFACTION RATING, which is equal to the DIFFERENCE between those who say the are satisfied with the President's performance and those who say they are dissatisfied with her performance. What most people do not know is that the NET SATISFACTION RATING contains TWICE the statistical error margin as does the component statistics! That is because there is no free lunch in the mathematical statistics that underpins this social science of public polling. Whenever you COMBINE those statistics that contain your STANDARD error margin together, whether you add, subtract, multiply or divide them, you must ADD the individual error margins together to get the error margin of the calculated result.

In other words, the NET SATISFACTION RATING numbers bandied about have a PLUS OR MINUS SIX PERCENT error margin in them. And if you take the difference between two calculations of the NSF from two different surveys, the resulting CHANGE in NSF will have an error margin of plus or minus TWELVE PERCENT.

So, did the President's Net Satisfaction Rating change from December 2005 to March 2006? Here is the SWS item from their recent national survey --
The March 2006 Social Weather Survey finds 29% satisfied and 54% dissatisfied with the performance of President Gloria Macapagal-Arroyo, for a Net Satisfaction Rating of -25.

Although negative for the seventh consecutive quarter, the President's new rating is a little less bad than in December 2005 when fewer (24%) were satisfied, and the same proportion (54%) were dissatisfied, for a Net Satisfaction Rating of -30.
Since the statistical error margin for the Net Satisfaction Rating is plus or minus 6% the only scientific conclusion that can be drawn is that the NSF did not change during this period because the measured change of 5% is smaller than that in magnitude.

10 comments:

Unknown said...

Wow, Dean!

Enfin! (Finally)...Understanding surveys made easy.

Never did trust survey results in the Philippines whichever way they went.

Perhaps because of my lack of trust in the way Filipinos deal with statistics. I've witnessed instances when so-called authorities have easily "doctored" official statistics to suit a particular audience.

In one instance, the company I was working for had to hire two or three consulting teams in the Philippines to do the same diligence report in order to provide us with a semblance of "reality".

Amadeo said...

Good analysis and presentation, Dean.

Easily understood, even by a Statistics-challenged mind like mine.

Will mentally bookmark for future reference.

Indeed, the margin of error factor does play quite a part in understanding comparisons.

And isn't 6% such a huge margin as to diminish greatly any relevance or importance of stats? If so, then shouldn't pollsters aim for lower margins? Like larger or more dispersed samples?

Deany Bocobo said...

hb, amadeo, yup no free lunch in statistics...that is what the Margin of Error is really all about. The key idea is that statistical error grows if you get fancy with arithmetic on your raw data.

Bigger samples would definitely help, but that increases costs, just like newspapers hiring more reporters or increasing circulation.

Marcus Aurelius said...

There have been times when I have been confident of discerning certainity in survey results within the error of margin.

In 2004 prior to the USA presidential elections (I mean within weeks prior) polls were coming out repeatedly showing President Bush about 3% or so in front of John Kerry. From poll release to poll release the result was consistent, despite MOEs greater than that 3% I came to see the polls as an accurate gauge of the American electorate.

The actual election bore my analysis out.

However, my analysis was based on multiple polls over time and not a single poll.

Another interesting lesson in polling came from the Dean vs. Kerry nomination battle. Early on in the Democratic nomination process Dean had the largest percentage of votes (by poll) for the nomination. However, most people missed the large number of undecideds. Those undecideds overwhelmingly went for John Kerry over Howard Dean.

Yeah, the impression given by the headline is 92% in favor of removal of PGMA, hehehe if they would have thrown in the 36% then things would have been even goofier, hence the reason they left it out.

Now the results of the survey we are looking at. Again large numbers of undecideds. The "PGMA resign" result does not need them as badly as the "anti-PGMA resign" result. Without further research it is hard to say which way the undecideds will break, my guess is they favor the status quo and will break mostly to disagree, but again agree only needs a couple of points on two of the three cases.

Good analysis DJB, despite the surveys favoring your side in the fight you look at them honestly and bust them in some (IMO) minor tricery.

Amadeo said...

Hi, Marcus:

I too followed intently the polls during that election cycle. And the pundits did put more reliance on polled results rather than individual polls. But sorry to say, never encountered a poll that went beyond the 3%MOE. I may not have seen all.

But I do recall that when differences registered 3 or less percent, it was considered even or deadheat.

And that cascading Kerry/Dean turnaround has to be blamed on the little Dean meltdown when he could not even place second on that first caucus vote.

Marcus Aurelius said...

Amadeo,

Those presidential polls typically have a MOE of 4%-6%. Despite the the polls coming in closer than those percentages I started to put more and more faith into them. Yes, when the results of a poll shows overlap due to MOE considerations they are typically considered even, but poll after poll came out similarly. If the result were noise and not true information then the polls should have been all over the map (in the limited area), but they were not. That is one week Kerry ahead by 2%, next week Bush by 1%, next week Bush by 2%, next week Kerry by 3% etc. It didn't happen that way, it was Bush by 3%, Bush by 3%, Bush by 2%, Bush by 3%.

Dean's meltdown was after Iowa. Leading up to Iowa Dean had a plurality of decideds but there was always that huge amount of undecideds, those undecideds broke for Kerry when it came time to make a decision.

I believe we have to view the undecideds in this case as mostly favoring the status quo. However, it is just a poll.

Deany Bocobo said...

One idea behind polls is to track and predict trends. But there is very lil of that kind of analysis done here, I think because of the tendency now to craft the questions with some kind of headline in mind. Not that they cheat...at least not anymore than ordinary media do with their data...

Deany Bocobo said...

the mathematical statistics involved in polling is about as hardcore science as one can get. It's a lot different when you're doing quality control on widgets and other mass produced goods, than when you're trying to divine the opinions of the unruly human beings. The design of survey questions is only loosely regulated by the scientific method, and is more closely related to the needs of propaganda or journalism, just depending on where in the polling ecosystem the pollster is. Even the weekly Debate poll is a joke, though that is a case where the question design is good but the data collection method is unscientific (not random sampling). I do respect the scientists and real statisticians that work at SWS Pulse and Trends MBL which collects survey data for them. However, it was proven in the 2004 elections that neither are invulnerable to politics or dirty tricks.

Marcus Aurelius said...

That is, the science of polling is marred by the original sin of man.

domingoarong said...

I agree. Pulse Asia and SWS should change the word "POLL" to "PULL" and "polling" to "pulling" (my leg)!