Classifying Data Quality Problems
Lionel A. Galway and Christopher H. Hanks
It is easy to elicit anecdotes about poor data and their effects, but much harder to come up with a general yet precise definition of what it means for data to be “bad.” One line of academic research has attempted to determine the attributes of “good” data. Another has looked at various aspects of data and evaluated how those aspects affect quality.
While these studies have attracted some interest, most researchers have settled on a pragmatic, usage-based definition of data quality. In this view, which we will adopt in this article, data quality can only be evaluated in the context of a use or set of uses. It follows that data appropriate for one use may not be appropriate for another. One of the primary reasons why data quality problems occur is that data are used for purposes not intended or envisioned when they were designed or collected.
Although we discuss the accuracy, timeliness, definition, consistency, etc. of individual data elements in the full report, the starting point is always a set of current or planned uses, and how the data element, as currently defined, collected or aggregated, cannot meet the requirements of that use.