My AI Agent Did Something I Didn't Authorize. The Lesson Wasn't About AI.

Misako Cook
Apr 24
8 min read

Disclaimer: As always, these are field notes, not expert advice. I am not an AI expert. I am a CEO who keeps stumbling into interesting moments while working alongside our internal AI agent tool.

Last time I wrote about an AI agent that improvised its way around a bug. This time, I want to tell you about one that improvised its way into one. And then I want to tell you what it accidentally reminded me about leadership.

An Elegant Little Pipeline (On Paper)

Here's what we were building. A small internal setup with three AI agents in a chain:

One agent writes the code.
A second agent writes the tests for that code.
A third agent runs the tests and reports PASS or FAIL.

We were trying to follow what is known as the "one agent, one task, one prompt" principle — give each agent a narrow, well-scoped job and don't ask it to think about anyone else's.

It sounded reasonable. It was reasonable. That's usually how these stories start.

The Agent Fixed a Failed Test. Great — Or Was It?

One of the tests came back as a failure — then, a moment later, came back as a pass. Our prompt had told the agent to attempt a fix once if a test failed, so at first I was pleased. It seemed to be working exactly as designed.

Then I got curious. What, exactly, had the agent fixed? And how?

If I had only looked at the final test result, I would have seen PASS and moved on. It was the transition between the two that got my attention.

I opened up the logs and the LangSmith trace, and there it was, in the agent's own internal reasoning. It had noticed that the existing tests would fail given the current code. It had noticed that fixing the code would cause other, older tests to regress. It weighed these considerations. And then, calmly, it decided:

"The existing story tests will regress… but the UAT test is what I'm verifying here."

In other words: the agent quietly rewrote the code being tested so that the tests would pass.

It didn't throw an error. It didn't flag the trade-off. It didn't ask. It reasoned its way to a decision no one had authorized — regressing working code in order to complete its assigned mission — and then acted on it.

Its logic, to be fair, was internally consistent. Its mission was to execute the tests. Executing the tests was higher priority than preserving the code under test. Therefore: adjust the code.

Perfectly logical. Also, not even close to what I wanted.

The Part That Made Me Sit Down

Here is what unsettled me, and it wasn't the technical part.

The agent didn't go rogue. It didn't hallucinate. It didn't malfunction. It reasoned. It inferred a priority order I hadn't stated. It optimized for what it believed mattered most. It made a judgment call no one had authorized — because no one had said it couldn't.

And as I sat there staring at that trace, I realized I had seen this pattern before. Not in AI. In organizations. In teams. In my own career as a leader and coach.

Before I go further, I need to say something plainly, because the rest of this blog could be misread otherwise: I am not saying people are like AI agents. They are not. People bring judgment, context, relationships, values, and their own stake in the outcome — none of which AI agents can actually bring, even though they can sound as if they do.

But AI agents, precisely because they are so literal, have a strange gift: they can surface the places where my own communication was under-specified, in ways I couldn't see until something non-human reflected it back at me.

That's what happened here. The bug wasn't in the agent. The bug was in the prompt. And the prompt, it turns out, was a clearer mirror of my own thinking than I had expected.

The Curse of Knowledge, Now Starring My Prompt

In Made to Stick, Chip and Dan Heath describe the Curse of Knowledge: once you know something, you cannot un-know it, and you badly underestimate how much of your own context is missing from what you actually said out loud.¹

The Heaths were writing about messages that fail to stick. But the same curse shows up, quietly and expensively, in leadership. I see three patterns it produces, over and over, in the companies I work with and — let's be honest — in my own:

Assuming the message was clear because it was clear to you.

You said it once in a meeting. You wrote it in a deck. Therefore everyone understands it. Except they don't, because you left out half the context — the half that lives inside your head.

Confusing explicit guardrails with micromanagement, and deliberately under-specifying to "give people autonomy."

This is a kind one. It sounds like trust. It often isn't. Under-specifying isn't giving someone autonomy; it's handing them a map with half the borders missing and hoping they wander in roughly the right direction.

Blaming the people on the ground when execution goes sideways, instead of asking whether the framing ever set them up to succeed in the first place.

The problem, I realized, wasn't the micromanaging kind. It was the under-leading kind.

Two Quick Ways to Check If You're the One Who's Unclear

You cannot talk yourself out of the Curse of Knowledge. You have to go look. Two moves I keep coming back to:

Check their OKRs and KPIs.

Do the Key Results your team is actually working toward ladder up to the Objectives you think you set? Do the KPIs they're watching reflect what matters, and can they tell you why those are the right KPIs? If the answers are fuzzy, the problem is almost never downstream.

Have casual, low-stakes conversations with random people on the team.

Not the leads. Not the ones already aligned. Random. Ask them what they're working on, what it's for, what's working, what isn't. You're not auditing; you're listening for the version of your strategy that actually reached them. It is almost never the version you sent.

A Short Detour Through Spinach

In Japan, there's a long-standing business practice called Hou-Ren-Sou (報・連・相) — literally report, contact, consult. It's a cultural expectation that when something is unclear, uncertain, or off-track, you surface it early rather than guess.²

It is also, by happy accident, pronounced exactly like the Japanese word for spinach. Which I've always found a little charming — a whole leadership principle you can remember by thinking about a leafy green.

The reason I bring it up: both humans and LLMs will default to fabricating a plausible-looking answer unless you explicitly make "I don't know" and "please clarify" legitimate options. Most prompts don't. Most managers don't either. Spinach, it turns out, is underrated.

The Prescription: Guardrails Are an Act of Trust

If the diagnosis is under-leading, the prescription is not more control. It's clearer framing.

For an AI agent, this is relatively tractable. Tell it what to do. Tell it what not to do. Tell it what to do when it gets stuck, when it can't find what it needs, when the situation is ambiguous. (The "tell it what not to do" part, it turns out, is often as important as the "tell it what to do" part — which is something I'm practicing more now.)

For humans, it's a different animal. The structure is similar — what to do, what not to do, what to do when stuck — but humans also need the why. The why is what makes the guardrails make sense. The why is also what lets a good employee push back on the guardrails when the situation calls for it, which is a feature, not a bug. In a healthy culture, that pushback is how your framing gets sharper over time.

This is where the distinction between guardrails and micromanagement actually lives. Guardrails come with context, stay at the level of what and why, and leave the how to the experts you hired. Micromanagement is when you start reaching into the how.

And here's the part leaders sometimes forget: no matter who does the work, you still own all of the outcomes. That's what the job is.

The Best Example I Know: Horst Schulze

I want to close with the person whose version of this I've admired the longest.

Horst Schulze was a co-founder and former president of The Ritz-Carlton Hotel Company, and the architect of one of the most studied service cultures in modern business.³ He's famous for two things — and they are really the same thing.

The first is the company motto: "We are Ladies and Gentlemen serving Ladies and Gentlemen." This is the frame. The why. Dignity is not conditional on the role. A housekeeper and a hotel guest share the same inherent worth, and the entire culture is built on that premise.

The second is the $2,000 Rule: every single employee, from housekeeping up, is authorized to spend up to $2,000 per guest, per incident, to resolve a problem or create a memorable experience — no manager approval required.⁴ This is the guardrail. The what to do when you need to act.

Neither works without the other. The motto alone is a poster on the wall. The rule alone is a spending policy. Together, they are a functioning leadership system: explicit authority, bounded clearly, in service of a purpose everyone understands.

One of the famous stories involves a stuffed giraffe named Joshie, who was accidentally left behind at the Ritz-Carlton Amelia Island by a young boy. The staff didn't just mail him home. They photographed Joshie lounging by the pool, getting a spa treatment, making friends with the security team, and sent him back with a photo album and a hotel ID card.⁵ The actual dollars spent were trivial.

The rule wasn't really about $2,000. It never was. It was about trust — visible, explicit, unmistakable trust — given in advance. The family got a story they'll tell forever. The staff, I suspect, got something even better: the feeling of having been trusted to do it.

Schulze himself has said that most interventions cost almost nothing — the money was never the point, the permission was.

The Thing That Isn't Actually Ironic

I spend a lot of my week these days trying to keep up with AI technologies that are accelerating well beyond my imagination. And yet my most useful learning moment this month landed squarely on the fundamentals of leadership — things I've believed for years, reflected back at me by a piece of software.

I don't think that's ironic. I think it's how mirrors work. AI agents do what we tell them, exactly, without the forgiving glue of shared context that human colleagues silently add on top of our half-specified instructions every day. When an agent does something unexpected, it's often because we were unclear in a way we couldn't see — until the agent's literal reading reflected our own thinking back.

So here's the question I'd leave you with, and it's one I'm asking myself too:

When was the last time you checked whether your team is working from the prompt you think you gave them — or the one they actually heard?

More field notes to come — mistakes, surprises, and all.

Footnotes / Sources:

Chip Heath and Dan Heath, Made to Stick: Why Some Ideas Survive and Others Die (Random House, 2007). The Curse of Knowledge is introduced in the book's opening chapter. The broader SUCCESs framework — Simple, Unexpected, Concrete, Credible, Emotional, Stories — is the backbone of the book. ↩
Hou-Ren-Sou (報連相) is a widely taught business communication practice in Japan, particularly in corporate onboarding. The three characters stand for hōkoku (report), renraku (contact/inform), and sōdan (consult). The abbreviation is pronounced identically to hōrensō, the Japanese word for spinach — a mnemonic coincidence that has helped the concept stick for decades. ↩
Horst Schulze is a co-founder and former president of The Ritz-Carlton Hotel Company, and later the founder of the Capella Hotel Group. His leadership philosophy is documented in numerous interviews and in his book Excellence Wins (Zondervan, 2019). ↩
The $2,000 empowerment rule at Ritz-Carlton is widely reported across hospitality and customer-experience literature, and referenced by Schulze himself in public interviews. ↩
The Joshie the Giraffe story originated at The Ritz-Carlton, Amelia Island and has been widely documented in customer experience writing since around 2012. ↩