Thursday, November 26, 2009

The Data Is In

In my days as an engineering student I took a course entitled Engineering Methods. One interesting lecture was on dealing with data. The professor presented us with a vernier caliper and an object of unknown length. This was in a lecture hall containing 150 or so of us. He gave a quick lecture on how to use and read the caliper and then passed the caliper, the object, and a paper upon which we were to record our measurements.

The next day he came in and presented us with copies of the raw data. He then pointed out to us some interesting facts about the data. I can not recall the length of the object but let us suppose it was found to be 16.8 cm. While scanning the raw data we all noted a few measurement recordings such as 170 cm, 1.6 cm, or 8.5 cm; whereas most readings were close to the object's 16.8 cm length. He then launched into a discussion of those fishy recordings. He went on to explain that further analysis would exclude those fishy recordings, explaining we do not need to measure the object to know it is not 8.5 cm and the other two outliers could be justified as someone misplacing a decimal point. Thus he introduced the concept of data outliers to us.

Subsequent lab classes I took (after dropping out as an engineering student and entering the physics program) always discussed the handling of outliers and the raw data. Outliers were not to be discarded from the raw data reporting and that in order to discard an outlier you had to provide justification. That is, proper experiment reporting dictated all data is to be disclosed including data we judged as being suspect.

Now, most college level experimentation replicates historical experiments with well known results. While pride may be lost in not getting the expected result the important part of the process is to learn experimentation and how to properly present the results. I did not lose any sleep when the results I calculated from some laser experimentation did not turn up as expected. However, proper scientific process demands you present the results even if you suspect flaws. Show your data good and bad, justify the included and excluded data, present your calculations and analysis, and then present your conclusions.

I suppose in my earlier example one might object that throwing out the 8.5 cm reading based on the looks of the object is improper, and it should be included in the analysis. That is not a wholly indefensible position and a lot of scientific debate goes on on such points especially in areas of debate where we have a much harder time distinguishing between good and fishy data.


rc said...

Just a couple of question: Was this article related to the CRU scandal? If so, is this some attempted justification of the blatant curve fitting and data massaging that apparently occurred.

Marcus Aurelius said...


Good read between the lines on the first observation -- because it is. I'll ask you to stay tuned for my take on the later question!

Also, I suggest all to read Wretchard's latest post "More AGW Controversy" at The Belmont Club.

GabbyD said...

what is the CRU scandal?

rc said...

Thanks Marcus,

"Also, I suggest all to read Wretchard's latest post "More AGW Controversy" at The Belmont Club."

Yep...hence my questions. Belmont Club has been all over it. I've been following this very closely, as I'm sure you have.

Marcus Aurelius said...


Sorry for not getting back to you on this, sooner.

CRU is the Climate Research Unit at The University of East Anglia in the UK. The term "CRU Hack" refers to the package of e-mails, programs, documents, etc that an unknown (at least to myself and at this moment) person made public.

Global warming skeptics point at the release as evidence that less than stellar science is being used to justify dramatic policy changes.

Those pushing human caused global warming say that the claims of the skeptics are not supported by the release (if addressed at all).

rc said...

"Global warming skeptics point at the release as evidence that less than stellar science is being used to justify dramatic policy changes"

Hehehehe. That's one way of putting it, I suppose.