Caution — Relationship Work Ahead
July 2009: Originally published in IDQ Newsletter Vol 5 Issue 3
Michael P. Meier
In all of the discussion about detecting, repairing, and eliminating causes of data failures, the issue of relationship is mentioned rarely, if at all. If mentioned, it is always in the context of “relational integrity,” which has come to mean verifying that a “foreign key” value is traceable to an actual record in the “related” table. This job is often delegated to the DBMS.
This being the case, all that needs to be done (so goes the current thinking) is to document a relationship in the data model, from which the database schema is generated, and make sure integrity checking is turned on. This article shows why this approach is destined for failure and suggests some ways to uncover and eradicate this most subtle, most devious, and most costly of all data quality problems.
The Relationship Pattern (a meta model)
It will help to ensure that we are all thinking of the same thing when we talk about relationship. That line connecting two entities on a data model has meaning — it represents something real. The first mistake we make is thinking that it represent a foreign key. It might be helpful to think of a formalized business relationship as a contract. A contract details the expectations of all the parties and all parties are obligated to fulfill those expectations.
The purpose of the universal patterns for data modeling, as Silverston and Agnew  remind us, is as a tool to extend and develop just about any type of data model. One of the drawbacks of a data modeling product is that it is too often viewed as a static artifact, much like a roadmap.
Figure 1. The Relationship Pattern (a Meta Model)
What does the data model lack that renders it lifeless? What prompts us to treat it as an artifact? It lacks any genuine reflection of relationship. No human endeavor can be described without referring to the relationships that bring it to life. Those one-dimensional, annotated symbols on the data model bear no resemblance to the thing they are intended to represent. We require a much fuller grasp in order to imagine the model alive.
Let’s take a stab at a universal pattern for relationships (Figure 1). Certainly, if we can shed some light on relationships and bring the concept to life, it’s just possible that a substantial amount of stress can be lifted from our lives. It’s also possible we can relieve similar stress on our data systems.
Description and Purpose of Relationships
Beginning with the properties of the relationship class itself, we note that it has both a description and a purpose. The description will lay out the basic ground rules as in:
A baseball team will consist of a sufficient number of members to compete under the rules of the games of baseball (league rules being the final authority) with a maximum of nine active players and a sufficient number of reserve players to guarantee that the required minimum number is always available.
You might wish to add additional descriptions regarding the various positions and skills that must be included or suggest a set of roles without which the baseball team (relationship) will be incomplete. That’s up to you as a user of the pattern.
When a new member joins the team, he or she will accept a role and subscribe to the purpose. The purpose is an essential part of the relationship since, without agreement as to purpose there will never be agreement as to role expectations. For example, a team whose purpose is to return a profit to the owner(s) will have far different expectations than the team whose purpose is to win a championship.
Even in a relationship such as marriage
def: Marriage is a lifetime commitment to join the lives of two persons.
it is critical that the parties agree on a purpose at the time the relationship is instantiated. If one person’s purpose is to raise a family and the other’s purpose is to live a life of romance, some negotiation is going to have to take place to make the relationship last. Clearly, we have pared the definition down to the bare bones and not all would agree with the definition as it stands here.
One further example will help illustrate the importance of purpose and definition to a relationship. Nearly all of us have experienced an attraction to another person. This is almost always accompanied by anxiety, uncertainty, indecision and, sometimes, outright fear as we realize that this one is different. What are the questions we need answers to?
- What name shall I give to this?
- What name is he/she giving to this?
- What shall I call myself in relation to him/her?
- What shall I call him/her in relation to me (friend, boyfriend…)
- What do I want?
- What does he/she want?
The answers to these questions suggest expectations and, if the parties are giving incompatible answers, their expectations can’t be aligned. Without serious negotiation, the relationship is doomed.
Bystanders as Non-Parties to the Relationship
The previous section illustrates the importance of a clear definition to those who are parties to the relationship. But what about bystander roles? Someone who is not a party to the relationship may be affected by it in some way. In the case of a marriage relationship, for example, we will have to introduce in-law roles and manage those expectations.
A pure bystander will not be in any formal relationship with any of our parties and for that reason cannot be allowed to have any expectation regarding the relationship. They should only be considered to the extent that the “bystander” entity is involved with one of the parties in yet another relationship of some kind. For example, I am a bystander in the conveyance of billions of bailout dollars to General Motors, but if I purchase a vehicle from GM, I am a party with a role and expectations.
Roles in Relationships
The complete picture of any relationship consists of potentially many roles. It also seems likely that a given role might be found in more than one relationship. For example, buyer might be a role in a retail relationship, a real estate transaction, or in a contractual relationship. To bring this home, man or male are roles found in many relationships, as are woman and female. An important aside: does your model include one of these as an entity? These roles are often confused with entity types and may often be found as categories. Do you see how the male and female roles are different than the values for a gender attribute that we might assign to a person entity type?
The “traditional” line segment(s) are used here only to convey information about cardinality and optionality. A line segment, no matter how richly adorned with avian feet, bars, or circles—even labels, cannot convey the meaning of a relationship. At best it can only convey the idea of a relationship, which must be fleshed out if we are to make our model really useful.
Role Expectations in Relationships
Our pattern denotes the many-to-many situation that exists with respect to relationships and roles by means of an entity type called role-expectation. Every relationship involves the expectations of the participants. When expectations are clear everything can flow along smoothly. When they are not clearly articulated or when they are not put into words at all, the parties have virtually no chance to create a positive and productive relationship.
A specific entity (an instance) may be filling one or more roles in various relationships. A role is filled by one or more entity instances at any point in time and over time. We all understand that the role of husband can have a history over the lifetime of a marriage relationship. Similarly, the role of buyer in a contractual relationship may be held by multiple individuals at any time and is almost certain to be filled by more than one person over the lifetime of the contract.
What we are doing is generalizing.
We have been warned about generalization since childhood and yet it is so useful that we risk the problems in order to reap the reward of being able to simplify the way we deal with the world.
Be warned: although all marriages share some critical roles and the expectations for those roles are formalized to the extent that audiences of thousands will laugh uproariously at a joke that references one of the expectations, it is still dangerous to rely on that expectation in your particular instance. It certainly aids in negotiating expectations, though, to have a starting point already defined.
Implications for Data Quality
Have you ever scanned a page or a screen full of data without the benefit of any column headings or other documentation to provide a frame of reference? Amazingly, you begin to develop an idea for a plausible interpretation of the various kinds of data and their relationships. If everything is good, you begin to recognize that this column is probably customer numbers and that one is the year, quarter, month. This columns looks like a dollar amount and that number is probably a quantity purchased. It all starts to make sense.
Then, in the real world, you see a row that doesn’t fit the pattern. Now what? There are a host of possible next steps, but no matter which one you choose, the bloom is off the rose. You have lost some amount of confidence in the data.
What if you suddenly discover the column headings at this point? What if that one row still sticks out? Remember, no single row-column value looks wrong. It’s only when they’re put together that a line or row looks wrong. You have just discovered a mis-documented relationship.
In the days of paper filing, a document or a folder filed in the wrong drawer could be lost literally forever. The same thing can happen in the virtual world. The only advantage we have today is that, if we can describe the anomaly we’re looking for, the search can proceed very quickly. Often, though, we can’t describe the anomaly. The values are all correct and current—they just don’t belong together. We know that the business rules we are aware of don’t permit the configuration we’re seeing.
We start to trace the origin of the offending record and that’s when we hear, “Oh, I thought that meant…”
We ask, “How long have you been doing this?” and “How many times have you entered the data like this?” We finally realize that it isn’t this person’s fault and that they aren’t the only one with a different understanding.
Poorly understood and therefore poorly implemented data relationships lead to the most serious kind of problems. The problem goes by the name of “that’s not what I meant” or “I thought it meant” or, most often, “Ohhhh.” The symptom is records indicating relationships that “can’t be” or “shouldn’t be possible.”
This kind of data presents a reporting nightmare. There is simply no way to be sure how widespread the problem is and, once discovered, it throws an entire data source into doubt.
Contrast this relationship problem with the more common data quality problems in which a keystroke error causes a phone number, a birth date, an address, social security number or a drivers license number to be invalid. Those mistakes are generally repairable. Many, if not most, data quality errors can be trapped and prevented with better attention given to the user interface and edit checks or even (perish the thought) properly data typed database fields and well-defined constraints. Text types are not always the best solution — only the easiest.
A misunderstood relationship is different in that the data frequently can’t be repaired. Even though some instances may stand out as wrong, many others will escape detection until we rely on them for something critical.
For example, an ambulatory care facility tracks each patient’s primary care physician. A patient is “allowed” by the data model to have exactly one physician who is their “primary care physician” or PCP. A field, labeled PCP, is on the registration screen used by the receptionists in each department to check in patients. There is no guidance on the screen and the training for new receptionists doesn’t cover this field specifically. For whatever reason, one specialty department or one or two receptionists, who may move from department to department, make it their practice to select as the PCP the name of the doctor who is seeing the patient for the current visit. Their decision is reasonable given that the drop-down list from which they select the PCP includes all physicians, NPs and PAs in the organization.
After a relatively short time, a substantial portion of the patient population has a documented PCP relationship with a physician who is not in a primary care specialty. The bottom line is we cannot say with certainty who anyone’s PCP is.
How can our pattern help? If we have implemented role-expectations in our systems, we will have the ability to validate that a given entity instance actually has a role in the relationship. We’ll have documented the expectation for that role and be able to enforce the current expectations. We have to acknowledge that some expectation in the real world can probably never be adequately specified and, indeed, may change with different participants according to the day of week or time of day. For that reason, we should focus on expectations that are critical to the value of the data itself and can be enforced.
How to capture, store and use “business rules” has always been a point of debate. A few of the more common ways include table-driven rules engines, triggers, constraints, and software (program) controls.
Virtually all of the business rules you might want to define are really role-expectations that are part of a relationship. Using our pattern will greatly improve our ability to find, document, and enforce them. The DBMS cannot help with this and the critical component is an actual relationship between two human roles, data architect and software developer. These two roles must align their expectations regarding the “data model” so that both are getting what they need. Failure of this relationship guarantees continued loss of value in the data resource.
© 2009 Michael Meier
About the Author
Michael Meier (M.S., Cardinal Stritch Univ.; B.S., Iowa State Univ.) has been a “data person” for more than 20 years in industries as varied as Defense, Manufacturing, Transportation, and Health Care. He currently does business at WhiteLake Data Management (www.m2dxtx.com) as a coach focusing on governance and quality issues. His extensive background in software development methods and tools helps him understand key roles and expectations and allows him to function in a mediation role in problem scenarios.