The defect numbers kept climbing and nobody could explain why.

Not for lack of trying. The program had a full quality apparatus with anomalies tracked, corrective actions issued, and metrics reviewed in every operating rhythm. The standard retrospective machinery that serious programs run and almost no program uses well. Everyone was looking at the data, but the data, it turned out, was only willing to tell us what had already gone wrong.

That was the problem. Not the defects, themselves. The measurement system that could only see them after they existed.

I was an industrial engineer on the Space Shuttle program at United Space Alliance in 2007, handed a quality problem that had been sitting in the data longer than anyone wanted to admit. The program was measuring outcomes, meaning what went wrong, when, and how often. What it was not measuring was the upstream conditions that produced those outcomes. On a program where the consequences of failure were irreversible, the difference between a lagging indicator and a leading one was not an academic distinction. It was the difference between managing a system and reacting to it.

So I started looking upstream at the conditions that produced the defects. I was looking for the signal inside the noise that everyone else had decided was background, the thing that, if you caught it early enough, let you intervene before the defect existed rather than after.

I found it in a place nobody expected to find it. Least of all me.

We had a processing facility where technicians were using a standard cleaning agent on certain components. A process that had been approved for years. Nobody had flagged it as a concern, but when I started mapping variance in defect rates against process variables, that cleaning agent kept appearing in a pattern that should not have been random.

It was not the agent itself. It was what it represented, which was a place where the process had enough variability that small differences in execution produced measurably different outcomes downstream. It was an indicator. What it was indicating was a process that looked controlled and was not.

The constraint is almost never where everyone is looking.

That insight became the foundation of what I eventually called the Product Quality Index, a composite leading indicator methodology that aggregated upstream process signals into a single number predicting portfolio risk months before it appeared in cost or schedule data. I was asked to present it at the World Conference on Quality and Improvement in 2010.

I tell this story not to catalog a career accomplishment, but because it illustrates something about how operators think that I have never seen adequately described in any management framework I have read since.

Organizations under pressure almost universally look at the same place for the problem, which is the obvious failure point, the reported defect, the metric that is already red. That is the natural human response to a problem and it is consistently wrong as a diagnostic strategy, because by the time something is visibly broken, the actual cause is usually several steps upstream and the window for low-cost intervention has already closed. By the time the dashboard turns red, you are already behind.

The operators who consistently outperform their peers are not smarter about the obvious problems. They are earlier about the non-obvious ones. They have trained themselves to look at what the system is doing before it produces a result, not after. It is a completely unglamorous skill that almost never gets anyone a performance bonus, and it is one of the most valuable things you can do.

This is a skill that can be developed, but it requires deliberately breaking a very stubborn habit, which is the habit of waiting for confirmation before you act. Most organizational cultures reward the person who responds decisively to the crisis. Very few reward the person who prevented the crisis from happening, because prevention is invisible and response is visible, and visibility is how most organizations distribute credit.

If you are running a defense program, an operations team, or any complex system where failure is expensive, the question worth asking is not what are our defect rates, but rather what are the upstream conditions that produce defects, and are we actually measuring those? The answer to the second question will tell you something the first question never can, which is whether you are managing outcomes or managing the system that produces them.

Those are not the same job. One of them is much harder and much more valuable than the other. One of them also makes for considerably less dramatic all-hands meetings, which is either a benefit or a drawback depending on your relationship with drama.

The cleaning agent was trying to tell someone something for a long time before I looked at it. Most systems are.

See you next Tuesday.

Alicia

P.S. — If someone forwarded you this issue, you can subscribe at newsletter.theoperatorsplaybook.net. If you are already here, thank you. I wrote this one for you.

Keep Reading