Friday, October 6, 2006

DOST Tests Could NOT Have Measured Accuracy of the ACMs!

I think even the professional statisticians over at Social Weather Stations or Pulse Asia Surveys will agree with my title which is also the main conclusion of today's post. The DOST tests on Comelec's ACMs used 20,000 test marks corresponding to a statistical margin of error of plus or minus 0.7% -- far too large to be of any utility in judging whether an ACM meets a requirement of 99.995% accuracy. Even a "perfect test run" of 20,000 perfect optical mark reads by a given ACM only allows us to conclude that its accuracy rating could be 100% or its accuracy rating could just as easily be 99.3%! The DOST tests on the ACMs as reported recently by the Ombudsman are statistically INCONCLUSIVE as to whether or not those ACMs met the 99.995% accuracy rating. This comes from simply noticing something in a certain recently very notorious document...

The Ombudsman's Supplemental Resolution absolving Comelec commissioners of criminal fallout from ITF vs. Comelec and the Automated Counting Machines (ACMs) fiasco, contains the following on the testimony of Sec. Estrella F. Alabastro of the Dept. of Science andTechnology (DOST) and the tests they conducted to see how accurate the ACMs were as Optical Mark Readers:
(A) Secretary Alabastro testified that she was one of the members of the Advisory Council and the Technical Ad Hoc Evaluation Committee (TAHEC) who formulated the policies relating to the technical aspect of the automated election system. That when she was furnished with the list of twenty seven (27) key requirements to be used in the evaluation of the automated counting machines (ACMs) she noted that the accuracy rating that was required is 99.995%, whereas the Request for Proposal (RFP) had a higher accuracy rating of 99.9995%. She said however, that what was adopted in the meetings she had with the COMELEC and the Advisory Council is a 99.995% accuracy level and not the 99.9995% since the ACMs will be tested to read only 20,000 marks and not 200,000 marks.
Her basic information is that DOSTtested the machines for a 99.995% accuracy rating by feeding them 20,000 test marks on sample ballots and measuring the error rate.

When they observed ZERO errors in such a test run of 20,000 marks, i.e., a given machine correctly read ALL 20,000 marks, the DOST concluded that the accuracy rating of the machine was 100%. Indeed, in the Ombudsman's Report we find:
(D) Engineer Rolando Viloria stated that as the Chairman of the DOST Technical Evaluation Committee he issued the Tests Certifications attesting to the fact that the ACMs of Mega Pacific had obtained a 100% accuracy rating during the verification tests.
The MPC-supplied automated counting machines (ACMs) are based on Optical Mark Reader (OMR) technology, similar to that used by generations of students taking multiple choice exams. When Sec. Alabastro stated that "the ACMs will be tested to read only 20,000 marks and not 200,000 marks" it is because 99.995% accuracy implies that every time the machine is fed a mark to read, there is a 1-in-20,000 probability that it will make a optical mark reading ERROR; while an accuracy rating of 99.9995% corresponds to a 1-in-200,000 probability of an erroneous reading in which the machine either thinks there is a filled in mark when there is none in reality, or it thinks there is no filled in mark when there is.)

But was the DOST mathematically justified in claiming an accuracy rating of 99.995% (or even 100%) because a certain automated counting machine they tested was able to read without error a set of 20,000 optical test marks?

NO!! Here is an authoritative reference. And anyone who works with statistics, such opinion surveyors and quality control professionals knows why.

A test run with 20,000 optical test marks does not have enough STATISTICAL PRECISION to measure the required 99.995% accuracy rating because the MARGIN OF ERROR in such a measurement is TOO LARGE!

In order to measure an accuracy of 99.995% DOST must be able to distinguish the difference between 99.995% and 99.994%. That means that the STATISTICAL ERROR in their measurement of accuracy must be smaller than around 0.0005%

Most people are aware that the statistical margin of error in SWS surveys is equal to one divided by the square root of the NUMBER of respondents. So for their standard 1200 respondent surveys, the margin of error is plus or minus 2.89% which is usually quoted in news stories as plus or minus 3%.

If SWS used 20,000 respondents instead of 1200 respondents, their margin of error would be equal to plus or minus 0.7%.

And that number is also the margin of error of the DOST tests on the ACMs of Comelec because they fed each machine 20,000 test marks and counted the error rate.

So the correct, way for Alabastro or Viloria to have reported the DOST Test results was this: the counting machines were measured to have an accuracy rating of 100% plus or minus 0.7 percent. The DOST tests do NOT prove in a statistically acceptable fashion that the ACMs meet the required accuracy rating, even at the 99.995% level.

The correct number of test marks that should have been fed to each machine in order to be able to measure an accuracy rating of 99.995% plus or minus 0.0005% at confidence level 95% (Z=2) should have been:

n=200 billion optical test marks with E=1 million or less errors

This may seem like a lot of testing, but that is the consequence of even the REDUCED accuracy rating of 99.995%! With a proper appreciation of the challenge of such a verification test as DOST agreed to undertake, and with proper planning, such qualification testing could have been accomplished. Relaxing the confidence level to 68% (Z=1) would reduce the required number of test marks to 50 billion while maintaining the required .0005% margin of error.

CONCLUSION: It is possible that these ACMs DO have an accuracy even better than 99.995%, perhaps they could meet the original 99.9995% requirement. Or they could all be REJECTS at 99.3% which is within the margin of error of a 20,000 mark test from the reported 100% accuracy rating.

Based on testimony just revealed in the Ombudsman's Supplemental Report, the DOST did not conduct properly designed statistical tests to measure just exactly what the accuracy rating is of these ACMS.
One thing for sure, 20,000 test marks was not enough to prove at 95% confidence level that a given ACM had an accuracy rating of 99.995% because the test was "TOO BLUNT" -- it lacked the statistical precision required.


postigo luna said...

As to your authoritative reference, here's what it says at the very top of that page:

"This article or section does not cite its references or sources.
You can help Wikipedia by introducing appropriate citations."

Can't be that authoritative if even the Wikipedia (a work that anyone can contribute virtually anything to, for the benefit of readers)feels the need for a disclaimer that, in effect says: caveat lector - reader beware, this article hasn't been validated.

Be that as it may, I notice you only went so far as to say that the tests MAY have been inconclusive. I guess, if like you, we can trash the presumption of regularity in the DOST's performance of its duties, then, yeah, I guess you've made a devastating point.

But the test parameters were agreed upon by all stakeholders, DJB. You are attacking the statistical methodology, I think: a methodology that was accepted by technical representatives from all sides. And it wasn't just the DOST at those tests or scrutinizing those results either. engineers were there - some of them may have even been in the same profession you were in. So, i suppose they were all in on the conspiracy to pass off these inconclusively tested machines as a-ok?

kulas said...

DJB, ang husay ng stat analysis mo. Marahil puwede pang magamit kung igiit pang gamitin ang ACMs. Sinabi na ng SC na di na pedeng gamitin ang ACMs. Tapos na kaso.

Rizalist said...

Hi! thanks for indulging me here. The reference contains all the mathematical analysis I had to leave out. It's a great article on "How to Judge a Fair Coin" highly educational for anyone who really wants to understand statistical surveys and product testing. Which I see will become a critical aspect of public life from now on. Automation means we must all understand STATISTICS.

Here the INCONTROVERTIBLE and GLARING fact, revealed by the Ombudsman's own report, is that the DOST tests did not contain enough test marks to produce the STATISTICAL PRECISION required to tell the difference between a failed machine or a qualified machine.

That is the conclusion I draw from the declaration of DOST to the Ombudsman and I ask anyone who disagrees with my scientific analysis to point out why these observations and or conclusions are wrong.

I say a 20,000 mark test could only produce at best a margin of error of plus or minus 0.7% in the accuracy rating. That means DOST could not tell the difference between 99.995% accurate and 99.295% accurate.

It could not have accomplished what the tests were supposed to: distinguish those machines that pass the requirements from those that don't.

The tests have a fatal DESIGN flaw. Not enough test marks!

Rizalist said...

Yes indeed I am attacking the statistical methodology in so far as it is being used to further a patently political end. But you see, it is as if SWS conducted a survey with just 12 respondents, instead of 1200, and then made statistical claims about such an inadequate random sample set.

I am taking seriously what is offered in testimony to the Ombudsman as FACTS of the case and following her lead in RE-TRYING the entire case. But these facts have consequences to our view. One notes how the Ombudsmans Report is a re-hash of the defense offered by Comelec when the case was adjudicated, and by its defenders since then.

But here I am invoking Laws that were not written by the hand of man. I am invoking laws that call into question the conclusions being urged by the Ombudsman. Conclusions that are here exposed to be wrong.

But I have an open mind to FACTS. If there are FACTS of which I am not aware please supply them for me to take into account.

And ask any statistician about my main point that THERE WERE NOT ENOUGH TEST MARKS to achieved the required statistical precision to measure 99.995% accuracy.

postigo luna said...

what other facts do you want, DJB? all the facts have been laid bare - isn't that the reason why you were able to come up with this stat-analysis? but here's another fact - neither the COMELEC nor the DOST designed the test in a vacuum. IT engineers, software designers, even the occasional educated kibitzer were present. None of them brought this up. I wish they had, so that we don't need to be arguing this point now.

Neither did the Supreme Court. And this is exactly what i was saying when the I deplored the fact that the SC didn't get techies to inform their decision. If they had, then the ventialtion of issues would have been more thorough and, ultimately, more satisfying. However, the fact that you're bringing this up now, doesn't change the fact that they didn't.

Nor do your arguments conclusively point to any wrong doing, do they? Assuming you are right, the fact remains that the standards used were agreed upon - and therefore binding - on all the disinterested overseers from all sides. The standards were good enough to convince them, even if those standards don't convince you.

And not even the petitioners are questioning the DOST's findings. To be accurate, they are arguing that standards shouldn't have been changed midstream, NOT that the standards actually used were faulty - just that they shouldn't have been changed.There's a huge difference.

Oh and, by the way, the Ombudsman didn't RE-TRY the case. That case was being tried for the first time. Remember the SC is no trier of facts. The Ombudsman was the first adjudicatory body to have actually based its findings on facts. The FIO based her report (which was released unsigned) on statements made to her by Gus and Maricor, without opportunity for the COMELEC to present rebutting evidence; the Senate practically mirrored the SC's decision - almost to the last comma; and the SC, well, the SC is no trier of facts, like they always say, so if you don't base your decision on facts, aren't you building on sand?

Rizalist said...

Fair enough! The discussion we are really begging is: were there elements of a crime involved in awarding the contract to MPC? And what are the FACTS that point to a probable cause to prosecute the Comelec?

I assert that notwithstanding her Conclusions, the OMB's Supplemental Resolution actually contains many FACTS in the form of direct TESTIMONIES by the principal players, that reveal a pattern of manifest partiality, evident bad faith or gross inexcusable negligence.

Let me give you just one example of EACH, with the facts taken from the OMB's own report, but ignored in its conclusions.

EVIDENT BAD FAITH: The Comelec's Official Request for Proposal specifically required a 99.9995% accuracy rating for any acceptable system. This requirement was later decided by Comelec to be unachievable and so it changed the requirement to 99.995% AFTER the eligibility phase had already either discouraged or eliminated other potential bidders or offerors. That the Comelec did not then restart the entire bidding process with a revised RFP was EVIDENT BAD FAITH!

MANIFEST PARTIALITY: Both MPC and TIMC failed the DOST technical evaluation tests with 8 and 12 KEY REQUIREMENTS not met, respectively. Arguing that the failures of TIMC were somehow more severe than those of MPC, the Comelec awarded the contract to MPC anyway, which UNDENIABLY failed the DOST eval tests. This partiality became even more MANIFEST when one realizes that by law and morality Comelec should have declared a failure of the bidding process and started over.

GROSS, INEXCUSABLE NEGLIGENCE: In Quibal vs. Sandiganbayan the Supreme Court defined this concept:Gross negligence is the pursuit of a course of conduct which would naturally and reasonably result in injury to the government or undeserved benefit to private party. It is an utter disregard of or conscious indifference to consequences. In cases involving public officials, there is gross negligence when a breach of duty is flagrant and palpable. The acts of Comelec in their entire 2.5 billion peso modernization program smacks of this very thing...for look, they have not a blasted thing to show for 2.5 billion pesos, heedless and indifferent as they have been in commissions and omissions.

To me, the more I read the OMbudsman's Report, the more convinced I am that there IS criminal liability being covered up and whitewashed. The FACTS are in the testimonies in her own report!