2.4.2 How do things go wrong? A Possibility.

How do things usually go so wrong with data? The following is an example of one typical error occurring in data analysis. How are the data received? Although you may only receive one file it may have originally come from multiple files. You received a single excel file. For example financial information may be kept separately from personal information. Financial information (such as payment status) would be kept in the finance department of a company. Personal information might be kept in the marketing department of a company. For some projects I have received over 10 files, with hundreds of thousands of records – way too big for Excel. Then the data will be combined. The next few pages illustrate how easily something might go wrong.

PIC
PIC
PIC
PIC
PIC
PIC
PIC
PIC
PIC

What happened?

_ _

The program ran and in the SAS log no errors were written. The SAS log is where a SAS programmer will go to check if the SAS program ran correctly. In the SAS some mistakes can be found, but not all. The SAS programmer did not sort the files before merging the files. In addition the SAS programmer did not specify how to merge the files. Thus SAS merged the files according to the record order. That is, the first record of data from file ”A” was matched with the first record in file ”B”, the second record ... The best way to match the records in the files is by ID number, which is what was need to do to merge these files properly.

How would you know what happened?

_ _

You often would not! Good management is not about finding what went wrong all the time with analytical projects. Part of good management is seeing something wrong and then saying, we need to determine why this looks wrong, or strange, etc. As shown, when there is one noticeable mistake, there are often others mistakes. Imagine: A criminal is caught for the first time and is in a courtroom before a judge. He says this is his first time he did something wrong. The judge is definitely thinking, this is the first time he was caught but not his first criminal act. This is exactly what people think when they see a presentation with a mistake. If the results look possible but are not probable, then the probable is that there is a mistake somewhere.