Vize

Testing & Agents

Agentic Coding changes the role of tests.

When a human writes a small patch, tests tell us whether the patch broke something.

When an agent can rewrite large pieces of code, tests also become the language we use to tell the agent what "good" means.

That makes tests more important, not less.

Tests Are the Memory of the Project

Agents are good at local reasoning, but a project is larger than the current prompt.

A toolchain has accumulated decisions:

  • what diagnostics should say

  • where source spans should point

  • how generated code should look

  • which Vue edge cases are supported

  • which real projects must keep compiling

  • which false positives are unacceptable

Tests preserve those decisions.

Without tests, every agentic change is forced to rediscover the project from scratch. With tests, the project can push back. It can say: this behavior matters, this output is intentional, this error message is part of the user experience.

Snapshot Tests Are Especially Useful

Vize uses a lot of snapshots because toolchains produce structured output that humans need to inspect:

  • compiler output

  • formatter output

  • linter diagnostics

  • virtual TypeScript

  • source-mapped diagnostic locations

  • generated Musea metadata

  • build artifacts from fixture projects

Snapshots are not a substitute for assertions. They are a way to make broad behavior reviewable.

That matters for Agentic Coding because agents can create large diffs quickly. A good snapshot suite makes those diffs visible in a form humans can review. It turns "something changed somewhere in the compiler" into "this render output changed in exactly this case."

That is a much better review surface.

Determinism Is the Contract

Agentic workflows need deterministic tools.

If tests are flaky, the agent cannot tell whether its patch helped. If output order changes between runs, snapshots become noise. If diagnostics depend on ambient machine state, CI becomes a lottery.

So Vize cares about boring details:

  • stable output ordering

  • stable diagnostic IDs

  • stable source spans

  • stable generated code shape

  • stable fixture setup

  • isolated scratch directories

Determinism is not only for CI. It is what lets humans and agents share the same feedback loop.

Real-World Fixtures Keep the System Honest

Unit tests are necessary, but Vue tooling lives in real projects.

Real projects have:

  • unusual import graphs

  • package-manager layouts

  • generated files

  • macro conventions

  • style preprocessors

  • huge component trees

  • old patterns next to new patterns

That is why Vize keeps testing against real-world fixtures and snapshots. The goal is not to claim production readiness too early. The goal is to find every sharp edge that only appears outside a perfect sample app.

This kind of exhaustive checking is slow to build, but it is the path from experiment to real tool.

Tests Are a Conversation with the Community

Community feedback is not only issue comments.

It is also:

  • a real project that fails to compile

  • a diagnostic that points to the wrong span

  • a false positive that blocks adoption

  • a performance cliff in a repository nobody predicted

  • a production pattern that the toolchain did not understand

Every one of those reports should become a fixture, a regression test, or a benchmark.

That is how feedback becomes memory. That is how an unofficial experimental tool becomes more serious over time.

Agents Need Smaller, Better Loops

The worst testing setup for agents is one giant slow command that fails at the end with an unclear message.

The best setup gives layered feedback:

  • fast unit tests for local invariants

  • snapshot tests for output review

  • fixture tests for framework behavior

  • focused integration tests for tool boundaries

  • CI matrices for platforms and production builds

Agents can use that ladder. Humans can too.

This is one reason Vize keeps investing in test tooling and script consolidation. A good project should make the right check easy to run, easy to understand, and easy to scale up when the risk increases.

Trust Is Earned Repeatedly

No toolchain becomes trustworthy because its README says "fast" or "correct."

Trust is earned every time:

  • a diagnostic is precise

  • a fix does not damage nearby code

  • a snapshot change is explainable

  • a real-world project keeps passing

  • CI catches something before a release

  • an agent can iterate without losing the thread

That is why testing is not a side quest for Vize.

It is part of the product.

In the AI era, the best tools will not be the ones that generate the most code. They will be the ones that can generate, validate, explain, and reject code in tight, deterministic loops.

Tests are where those loops become real.