· 7 min read · Agents

Make Your Codebase Agent-Friendly

#AI #Agents

Your codebase determines whether agents deliver quality work or produce poor results. The same model and prompts can yield different outcomes. One repository can get clean pull requests while another generates messy code. The difference is usually the codebase, not the model.

Agentic engineering struggles when humans become the bottleneck. The models simply want to work.

Make the codebase verifiable by agents. Then let the agent manage the output.

Three key factors matter:

  • Environment: The agent should operate the system independently.
  • Intent: The agent needs to understand your goals and motivations.
  • Feedback loops: The agent can verify its own work quickly.

Before diving into those elements, let’s make a quick distinction.

Agentic engineering is not the same as vibe coding

Vibe coding gained popularity in early 2025. It involves describing what you want, accepting what the model offers, and not scrutinizing the output too closely. This approach works well for throwaway scripts and weekend projects.

Agentic engineering is different. You use agents to write code, but you remain involved as the architect and reviewer. You define the scope of work, outline the architecture, and assess the output as you would with a junior’s pull request.

The agent writes the code, but you own the system.

1. Environment

Provide the agent with the same access you have. If it can’t start the app, reach the APIs, or open a browser, the quality of the model doesn’t matter. You’ve limited its potential before it even writes a line.

First question: Can the agent create an isolated copy of the project for each task? Consider using git worktree, separate clones, or cloud sandboxes, depending on what your stack supports.

Can you run five agents in your repository at the same time right now? If not, identify the obstacles. It’s a limitation in your system that needs addressing.

I often work across multiple repositories or run several agents in isolated environments within a single monorepo. As their failure rates decrease, it becomes feasible to manage more of them simultaneously. Four months ago, running ten agents at once felt reckless. Now, it seems manageable.

Agents also require access to credentials. Can it retrieve an API key and test the endpoint it created? Access the database? Navigate through the app as a customer would?

If any of these actions require your intervention, you are the bottleneck. Remove yourself from the process.

Keep validation affordable

If every test run costs money, you won’t want to operate hundreds of agents, and you might not even enjoy running one.

Use mock services for expensive operations. Capture-and-replay proxies work well. Setting up a fake API is inexpensive with agents. I’ve had Claude Code do it in about ninety seconds.

Make the codebase easy to navigate

If the codebase is understandable for people, it’s likely understandable for agents as well.

Complicated abstractions and undocumented practices can confuse agents, just as they confuse new hires. A messy codebase will result in the agent continuing the mess. I’ve seen it happen repeatedly. Hand it a repository with no consistent patterns, and it confidently adds even more inconsistencies.

Noisy test output is another issue. If your runner produces thousands of lines of output before indicating a failure, the agent’s context becomes cluttered, making it hard to pinpoint the problem.

2. Intent

Context engineering matters. By now, everyone knows that. However, I’ve noticed that many teams don’t invest enough in it.

You don’t need a 1,000 lines AGENTS.md file to guide the agent in running your test suite. It already knows how to use a computer. The challenge lies in providing it with your domain knowledge, including your business rules. The decisions that you made three years ago may not be documented anywhere.

Document your domain knowledge

A lot of context resides in Slack threads, neglected Notion pages, and the minds of long-term employees. The agent can’t access any of this unless you make it available.

Document architecture decisions, domain glossaries, and deployment constraints. If it’s not documented, it doesn’t exist for the agent.

Let the codebase speak for itself

CLAUDE.md, .cursorrules, AGENTS.md give the agent context about conventions and constraints. However, be cautious not to overdo it.

Research from ETH Zurich published this month indicates that these files can hinder performance if they are auto-generated or filled with information that agents can figure out by examining the code. My advice: only include what the model can’t infer on its own.

Types, names, and structure are also forms of communication.processData is unclear, while validateAndNormalizeUserAddress is clear. Self-documenting code has always been important. It matters even more when agents read it, as they lack institutional memory.

Every unnecessary layer of abstraction between the agent and the actual values can lead to errors. Question the necessity of each layer before adding another.

Scope tasks carefully

Small, clear, verifiable tasks with the right context attached are more effective than vague instructions.

Agents are improving at tackling open-ended tasks like “here’s a problem, go debug it.” But knowing what works takes experience. Just use the agents regularly.

Architecture is still your responsibility

Involving humans in system design is essential right now. The agent can implement, but determining the system’s structure is up to you.

I refine architecture using a frontier model. Plan your approach, test it, then delegate implementation to agents with a narrow scope.

Review the code generated by agents. Even when it works, especially when it works, maintain a mental model of how the system is evolving.

3. Feedback Loops

The agent should never have to ask you if changes were successful.

Every change must be verifiable by machine, and that verification should occur automatically, not because the model remembered to trigger it.

Start simple

Use linters, type checkers, formatters, and static analysis to catch obvious problems before running more complex tests.

If something is important to you, test it. That principle remains unchanged.

Static analysis tools like SonarQube now support MCP servers. The agent can query results directly instead of waiting for you to provide them.

Fewer tests, better tests

Focus on unit, integration, and end-to-end tests that are deterministic.

Write behavioral tests. If you need to change tests only to accommodate changes in your code without altering behavior, the tests are not serving their purpose.

Enforce structure mechanically

OpenAI’s harness engineering post describes a pattern that resonates with me.

Using dependency constraints and architectural rules as tests can catch issues that code reviews might miss. Tools like dependency-cruiser can automatically enforce module boundaries.

When writing custom lint rules, turn error messages into remediation instructions. Instead of "Error: invalid import path," use "Error: use @/components instead of relative paths." The error output then becomes context for the agent to act upon immediately.

Verification should be part of the agent’s workflow

More of your checks should occur within the agent’s process, not after it opens a pull request. They should happen inside the loop.

Use Git hooks, pre-commit checks, or any mechanism that guarantees validation takes place. Don’t rely on the model remembering your rules.

I sometimes wonder if this structure will become unnecessary as models improve. Perhaps it will. But right now, the limitations are real, and these guardrails are proving valuable. I’d prefer to put them in place and remove them later rather than face the consequences of not having them.

Conclusion

Much of what makes a codebase friendly for agents also makes it good for humans. Clear abstractions, behavioral tests, documented domain knowledge, and quick feedback are essential. Agents haven’t created these needs; they have just made neglecting them costly in a way that’s hard to ignore.

This investment builds over time, and the returns increase as the models advance.

Further reading