How do things usually go so wrong with data? The following is an example of one typical
error occurring in data analysis. How are the data received? Although you may only
receive one file it may have originally come from multiple files. You received a single excel
file. For example financial information may be kept separately from personal
information. Financial information (such as payment status) would be kept in the
finance department of a company. Personal information might be kept in the
marketing department of a company. For some projects I have received over 10 files,
with hundreds of thousands of records – way too big for Excel. Then the data
will be combined. The next few pages illustrate how easily something might go
wrong.
|
What happened? |
The program ran and in the SAS log no errors were written. The SAS log is where a SAS programmer will go to check if the SAS program ran correctly. In the SAS some mistakes can be found, but not all. The SAS programmer did not sort the files before merging the files. In addition the SAS programmer did not specify how to merge the files. Thus SAS merged the files according to the record order. That is, the first record of data from file ”A” was matched with the first record in file ”B”, the second record ... The best way to match the records in the files is by ID number, which is what was need to do to merge these files properly.
|
How would you know what happened? |
You often would not! Good management is not about finding what went wrong all the
time with analytical projects. Part of good management is seeing something wrong and
then saying, we need to determine why this looks wrong, or strange, etc. As shown, when
there is one noticeable mistake, there are often others mistakes. Imagine: A criminal
is caught for the first time and is in a courtroom before a judge. He says this
is his first time he did something wrong. The judge is definitely thinking, this
is the first time he was caught but not his first criminal act. This is exactly
what people think when they see a presentation with a mistake. If the results
look possible but are not probable, then the probable is that there is a mistake
somewhere.