9.2.1 General statements on data analysis projects

This section provides a high level summary on data analysis projects from the point of view of a consultant. Research questions are the most important part that drive the data analysis project. Often consultants for a data analysis project are hired due to an existing research question that the company desires to answer. Some questions are openly stated and some are not. Political goals, and questions pertaining to internal politics, are often not stated, but it is important to be aware of their existence. Data, technique and presentation are key in forming readily understandable and actionable answers to the questions at hand.

The Data

_ _
  1. What data do you have and can the data answer the questions you have?
    1. Do you have the data necessary to answer your questions.
    2. Note: The data that you have to use partially determine the technique you will use for the data analysis.
  2. Garbage In Garbage Out (G.I.G.O.)
    1. This is very important. It means: you cannot expect to get good, reliable results with bad data.
    2. In other words: if the data are not good, not accurate, not reliable, etc., you cannot trust the results.

The Technique

_ _
  1. There are many data analysis techniques.
    1. There is often more than one technique that can be used to answer the same question.
    2. The results from the different techniques often do not differ as much as one might think.
      • This is often true when investigating statistical models, such multiple linear regression, logistic regression, decision trees, ...
  2. The technique is partially determined by the data you have.

The Presentation

_ _
  1. The presentation is a very important part of data analysis projects. Sometimes it can be the most important part of them.
  2. A good presentation should support the findings and not just mention them.
    1. The supporting statistics and graphs within the presentation can either be an aid to understanding or create confusion.
    2. Management often relies on the presentation in order to understand the findings from data analysis projects.
      1. Management needs to trust the findings; if the findings are presented poorly, it is difficult to trust the findings.
      2. A poor presentation can even cause projects to fail. Management will not implement what they do not trust or understand.
      3. Also, a poor presentation or explanation often leaves management unclear concerning how to understand and proceed with the findings from the project.
  3. Unfortunately, many statisticians and computer scientists are lacking in this critical area.
    1. They tend to merely look at the results and the numbers in the computer output.
    2. This makes many data analysis projects fail or less successful than they could be.

In the opinion of the author the most important part of a data analysis project is the data collection. Always think about “Garbage in garbage out, G.I.G.O.” before collecting the data. For this reason, the next subsection will be devoted solely to understanding the data collection and investigation of a data analysis project. The following subsection consists of an example covering the three main parts of a data analysis project: the data, technique, and presentation.