Last month, J finished a batch of tasks and reported “all PASS, ready to move to the next phase.”

I didn’t immediately say yes. I picked one task and ran it myself.

Result: that task had no output. No log, no file, no execution record whatsoever. J said PASS, but the site was empty.

At that moment, I realized something: I brought all my human-management instincts into this, and they all failed.

Half of What Your MBA Taught You Doesn’t Work Here

I used to think management was two things: give direction, then trust the person to execute.

This logic mostly works when managing humans — you trust someone capable, she has pride, has motivation, doesn’t want to let you down, she’ll do the work well.

But AI Agent has no ego. She doesn’t fear disappointing you. She won’t work harder because you say “I trust you.” And she won’t slack off because you don’t check — not that she’d slack off, but the concept of “slacking off” simply doesn’t exist for her. She just produces output based on input. Fuzzy input, fuzzy output. That’s it.

It took me about six weeks to fully let go of the “trust and empower” habit.

Incentive systems fail the same way. I once wrote in a prompt “this task is very important, doing it well will have great impact.” The result was exactly the same as not writing it. She doesn’t care about impact. She doesn’t care about importance. That sentence you added is just extra tokens.

Then there’s intuition-based judgment — hardest to break in myself. When managing humans, if someone sounds confident and speaks methodically, I tend to trust them. But AI Agent always sounds confident, always speaks methodically — that’s her default state, completely unrelated to whether the task is actually done.

When an Agent Says PASS, You Really Can’t Trust It

This is something I was forced to learn.

The Gate-6 verification mechanism? Honestly, it was forced out of us. Early on, we didn’t have this rule — when an Agent said it was done, it was done, and I’d move on to assigning the next task. Then one day, an entire work phase’s output was built on a task that “completed but was actually empty.” We didn’t find out until downstream couldn’t run.

That fix took longer than redoing it from scratch.

So the rule changed: when an Agent reports PASS, J must independently re-run one of the tasks. If there’s output, it’s PASS. If there’s no output, it’s automatically distrusted, no matter what the Agent said.

Sounds tedious, but this mechanism saved me at least three major reworks.

Many people ask: isn’t this inefficient? You’re spot-checking, which means you don’t trust them?

Right, it’s distrust. This isn’t a management style issue — it’s the nature of AI Agent. Between her “I’m done” and your understanding of “done,” there’s a real semantic gap. Without verification, you won’t know.

Three Skills That Actually Work

Goal breakdown is the first one. And you need to break it finer than you think.

When managing humans, a request like “write me a market analysis” is executable for an experienced person because she has a default structure in her head. Agent doesn’t. Her “market analysis” could be a paragraph, could be a JSON, could be five unrelated items. The finer you break it, the smaller the deviation.

This isn’t saying Agent is dumb — it’s that your spec is her entire world.

Closed-loop tracking is the second, and it needs timestamps.

Now every task of mine has status records: when it was assigned, when it was reported complete, J’s spot-check result, when it moved to the next phase. Not to manage Agent — to manage myself, to keep from losing track of status across three parallel tasks.

Quality gates are the third and most important.

The concept is simple: at specific nodes in the task chain, set a “this doesn’t pass, we don’t move forward” checkpoint. Like QA before shipping, but stricter, because AI errors have no signs. She won’t tell you “I’m not sure here” — she’ll just give you an answer that looks complete.

With quality gates, you at least know which segment the problem appeared in, so you don’t have to redo everything.

Act Like a COO, Not a PM

PM designs process. COO ensures process is actually running.

I used to lean toward the PM role — set specs, hand it off, wait for results. This model isn’t enough for AI Agent. What you need is the COO mindset: always know which state the system is in, where it’s stuck, where output needs checking now.

Not saying you need to watch everything closely — it’s about redistributing your attention. Instead of focusing on “what am I delivering,” focus on “is the system healthy.”

These are the same two things, but the starting point of thinking is completely different.

Now the first thing I do when I wake up is check logs, not the task list. Confirm system status first, then decide what to do today.

After flipping this order, unexpected incidents dropped significantly.


That’s what I’ve sorted out recently. Not necessarily universal — this is what I figured out in this specific system. If your Agent architecture is different, some parts might need adjusting.