What I keep, and discard, from AI engineering workflows

I have become increasingly skeptical of workflows that showcase model capability without improving engineering clarity. A workflow that feels impressive in a demo but creates review debt, noisy diffs, or brittle test suites is not leverage. It is just a different kind of queue.

The patterns I keep are the ones that operate close to real engineering bottlenecks. Retrieval helps when internal knowledge is fragmented and lookup cost is high. Automated review helps when it raises obvious risk before a human reviewer spends time on it. Test generation helps when it expands coverage in repetitive paths that engineers would otherwise skip.

What survives contact with production

Retrieval pipelines that improve the quality of answers without hiding source context.
Review automation that points at concrete risks instead of generating generic praise.
Test assistance that accelerates coverage but still leaves engineers in charge of intent.

The common thread is that these workflows reduce time spent searching, restating, or scaffolding. They do not replace technical judgment. In practice, the workflows that last are the ones that are easy to audit and easy to turn off when they stop being useful.

What I discard

I discard anything that increases output volume faster than it increases signal. That includes auto-generated code that nobody wants to maintain, test suites that pass without proving behavior, and “AI-first” flows that mostly move confusion into a different layer of the process.

Good AI engineering is quieter than it looks from the outside. It usually appears as cleaner retrieval, faster review cycles, or more consistent coverage. The most valuable outcome is not spectacle. It is better delivery with less drag.