Data Quality Rules: Rules for Historical Data
Most real world objects change over time. Newborn babies grow into playful toddlers, love-stricken teenagers, busy adults, and finally wise matriarchs and patriarchs. Employee positions change over time, their skills increase, and so hopefully do their salaries. Stock markets fluctuate, product sales ebb and flow, corporate profits vary, empires rise and fall, and even celestial bodies move about in an infinite dance of time. We use the term time-dependent attribute to designate any object characteristic that changes over time.
The databases charged with the task of tracking various object attributes inevitably have to contend with this time-dependency of the data. Historical data comprise the majority of data in both operational systems and data warehouses. They are also most error-prone. There is always a chance that we'd miss parts of the history during data collection, or incorrectly timestamp the collected records. Also, historical data often spend years inside databases and undergo many transformations, providing plenty of opportunity for data corruption and decay. This combination of abundance, critical importance, and error-affinity of the historical data makes them the primary target in any data quality assessment project.
The good news is that historical data also offer great opportunities for validation. Both the timestamps and values of time-dependent attributes usually follow predictable patterns that can be checked using data quality rules.