Quality of Derived Data. Part 2: Ambiguities of Data Definitions and Collections
The most important facts in our world are unknowable. Decision-makers often must deal with estimates. Yet, many decision-makers don’t want to deal with the issues or complexities of metrics. They want simple, easy-to-understand conceptual models and ratifying measures. And some older executives favor their intuition over empirical data.
This inclination toward simplicity is why a truly accurate, logical data model of the enterprise is so frightening to executives and some project managers. Reality is always more complex than they had imagined, or want to believe.
Similarly, any metric of societal, economic, or enterprise behavior is often unknowable (at least at the moment) and any expression of a metric must be qualified with more definitions and caveats (often put into footnotes) than most readers want to know.
While data quality practitioners traditionally have focused upon granular, operational data, the data which most executives are shown are aggregated, summarized, normalized, and otherwise processed (or manipulated) in ways those executives may not understand, nor want to understand. But as W. Edwards Deming often said, “The more you know about the weaknesses of a statistic, the more useful it can be to you.”
In a previous installment of this series, I showed how raw data needs to be placed into context and normalized for factors which we wish to exclude to isolate the phenomenon we are most interested in. The many steps of collection, aggregation, and computation offered numerous potential points of failure or loss of quality.
In this article we explore the ambiguity of definitions of facts. Unfortunately, just as much trouble lurks here!
Any fact or statistic is useless without a textual (as well as architectural) definition. And generally, facts also require context to be meaningful. That context can be in the form of previous values of that fact, or peer values of the fact (e.g., the time series charts in the previous article). But to be fair and accurate, we must ensure that the definitions of the facts are consistent over any reported dimension. An example of this is the value of accounting standards which ensure (somewhat but not absolutely) that the definition of “current assets” is consistent from year to year, and across corporate financial statements we may wish to laterally compare.