Agent Behavior Before And After AgentPack

Before

Task: fix auth token expiry.

The agent starts cold. It searches for auth, opens router files, follows imports, checks config, opens tests, and repeats that exploration after each interruption. The useful files are eventually found, but the first several turns are spent building a map that is not measured or reusable.

Typical cost:

Step	Behavior
Search	Broad `rg` queries over auth/session/token names
Read	Several unrelated routes, middleware files, and config files
Verify	Test files found late or missed
Repeat	Same orientation work returns in later sessions

After

With MCP:

start_task("fix auth token expiry")

AgentPack writes .agentpack/task.md, ranks the repo, and returns a compact map:

Rank	File	Why
1	`src/auth/token.py`	filename/content match, implementation role
2	`src/auth/session.py`	direct dependency, second-pass recall neighbour
3	`tests/test_auth.py`	paired test

The agent still verifies the source before editing. The difference is that it starts from a measured set of likely files, then uses explain_file, get_related_files, and benchmark --misses when the map looks incomplete.

Benchmark Proof

Use real historical tasks:

agentpack benchmark --init
agentpack benchmark --compare --misses --public-table
agentpack benchmark --public-repos --prove-targets --misses --public-table

Publish benchmarks/results/YYYY-MM-DD-public.md when the task set is real and the expected files are the files actually changed.