Inside Playwright CLI: Browser Automation Built for Coding Agents

Microsoft's playwright-cli gives coding agents a token-efficient way to drive a browser from the terminal. We cloned it, read the source, and broke down how the daemon, refs, and skills actually work.

Oskar KwasniewskiCTO

June 10, 202610 min read

Microsoft quietly shipped one of the most interesting pieces of agent tooling this year: playwright-cli. It is browser automation exposed as plain shell commands, designed specifically for coding agents like Claude Code and GitHub Copilot.

The pitch is simple: agents already know how to run CLI commands. Instead of loading MCP tool schemas and full accessibility trees into the model context on every step, the agent runs playwright-cli click e15 and gets back a few lines of text.

We cloned the repo, read the source, and ran it against real pages. This post covers what we found: the daemon architecture, how element refs work, how it reuses the MCP tool layer, and how to actually use it in your workflow.

What playwright-cli is

Install it globally and you get a single binary that controls a persistent browser:

Each command is a separate process invocation, but the browser stays alive between calls. State, cookies, and the page itself persist across commands. That is the core trick that makes a CLI viable for multi-step automation.

The output is deliberately compact. Here is what open actually prints:

Notice two things. First, the full page snapshot is written to a file on disk, not dumped into stdout. The agent reads it only if it needs to. Second, every action prints the equivalent Playwright TypeScript - codegen is built into the interaction loop.

Why a CLI instead of MCP

Playwright MCP is excellent, but it has a structural cost: the MCP client loads every tool schema into the model context up front, and tool results (including accessibility snapshots) flow back through the context window on every step.

Playwright CLI flips that model:

No tool schemas in context. The agent discovers commands from a skill file or --help, on demand.
Snapshots live on disk. The agent gets a file path and a compact page summary. The heavy YAML representation is a local artifact, read selectively with grep or partial reads.
Commands are composable. --raw output pipes into jq, diff, or files like any Unix tool.

The Playwright team is explicit about the trade-off in the README: MCP still wins for long-running autonomous loops that benefit from persistent protocol state and rich introspection. For coding agents juggling a large codebase, tests, and a browser inside one context window, the CLI is the better fit.

This mirrors a broader shift we wrote about in building the testing interface for agents: the winning interfaces for agents look less like APIs and more like tools a human would use in a terminal.

The repo is a 21-line shim

Here is the fun part. Clone microsoft/playwright-cli and look for the implementation - there isn't one. The entire published binary is this:

playwright-cli.js

The real code ships inside playwright-core itself, under lib/tools/. The GitHub repo is packaging: the skill files, integration tests, and docs. That tells you how Microsoft thinks about this - the CLI is not a side project, it is a first-class frontend of Playwright core, versioned and released with it.

What the repo does own is the skills/ directory, and that turns out to be half the product. More on that below.

The daemon architecture

A CLI that talks to a browser has an obvious problem: browsers take seconds to launch, and a process-per-command model would pay that cost every time. Playwright CLI solves it with a per-session daemon.

When you run playwright-cli open, the client:

Spawns a detached Node process running cliDaemon.js - this daemon launches and owns the actual browser context.
The daemon creates a Unix domain socket (named pipe on Windows) and writes a <session>.session config file with the socket path, browser info, and version.
The client waits for Daemon listening on in the daemon's stdout, then disconnects and exits.

Every subsequent command is cheap: connect to the socket, send one JSON message, print the response, exit.

Sessions are scoped to a workspace. The client hashes your project directory (it walks up looking for a .playwright folder) so the default session in project A never collides with project B. Named sessions via -s=name give you parallel isolated browsers within one workspace - and you can pin an agent to one with the PLAYWRIGHT_CLI_SESSION environment variable.

The daemon dies when the browser closes, and stale session files are cleaned up lazily when list fails to connect. There is also an escape hatch - kill-all literally greps the process table for daemon script names and sends SIGKILL.

It's the MCP tool layer with a new face

The most interesting internal detail: the daemon does not reimplement browser automation. Look at what happens to a command inside the daemon's socket handler:

packages/playwright-core/src/tools/cli-daemon/daemon.ts (simplified)

BrowserBackend and browserTools are the exact same tool registry that powers the Playwright MCP server. Each CLI command is declared with zod schemas for its args and options, plus a mapping to an MCP tool name:

So Playwright CLI and Playwright MCP are two transports over one implementation. Same tools, same behaviors, same snapshot format - the difference is purely how much of it ends up in the model's context window. There are 87 commands in the current help registry, which is already a larger surface than the original MCP server exposed.

How refs work

Every snapshot is an ARIA accessibility tree in YAML, with each element tagged with a stable ref:

When you run playwright-cli click e8, the daemon resolves the ref through Playwright's internal aria-ref= selector engine - the same mechanism MCP uses. Refs are tied to the most recent snapshot, which is why the CLI re-snapshots after navigation.

You are not locked into refs, though. The target argument accepts three forms:

If the target parses as a CSS selector or a Playwright locator expression, it is evaluated as such; otherwise it is treated as a ref. This matters for agents: refs are deterministic against the snapshot the agent just read, while role-based locators are what you want in the generated test code. The CLI even bridges the two - generate-locator e15 converts a ref into a proper getByRole(...) locator.

Skills are half the product

playwright-cli install --skills copies a SKILL.md plus ten reference guides into your project. The skill is the agent-facing manual: command catalog, examples, and pointers to task-specific references that are loaded only when needed - request mocking, tracing, storage state, video recording, session management.

The standout is spec-driven-testing.md, which encodes a complete plan → generate → heal workflow:

Plan - run the app through a seed test with npx playwright test --debug=cli, attach with playwright-cli attach tw-XXXX, explore the live page, and write a markdown spec of scenarios.
Generate - walk each spec scenario against the live app; every CLI action emits the Playwright TypeScript that becomes the test body.
Heal - when a test fails, attach to the paused test run, diagnose with snapshot, console, and requests, fix the locator or assertion, and reconcile the spec.

This is Microsoft shipping a QA methodology as prompt files. The --debug=cli integration is particularly clever: the CLI can attach to a paused Playwright test and drive it interactively, so generated tests inherit the project's real fixtures and setup instead of starting from a bare goto.

Practical tips

A few things we found useful when running agents against it:

Watch your agents work. playwright-cli show opens a dashboard with a live screencast grid of every running session. Click into a session to take over mouse and keyboard - useful when an agent gets stuck on a CAPTCHA or an OAuth screen.

Persist auth once. The default profile is in-memory. Log in manually in a headed session, then save and reuse the state:

This is the same storageState pattern we covered in handling authentication in Playwright tests - the CLI just makes it interactive.

Use --raw for scripting. It strips the page status and snapshot sections, leaving only the result value:

Limit snapshot depth on big pages. playwright-cli snapshot --depth=4 keeps the YAML small, then snapshot e34 drills into a subtree. This is the progressive-disclosure pattern applied to the page itself.

💡 The CLI reads .playwright/cli.config.json from your project root automatically - viewport, allowed origins, timeouts, and test id attributes all belong there rather than in per-command flags.

Where TesterArmy fits

Playwright CLI is the right interface for an agent that lives in your editor: it explores a page, generates a test, heals a broken locator. We use the same architectural ideas at TesterArmy - accessibility snapshots over screenshots, artifacts on disk over context stuffing - because they are simply the correct way to put an agent in front of a browser.

The gap is everything around the loop. A CLI session on your laptop does not give you scheduled regression runs, parallel execution across browsers, PR-triggered tests against preview deployments, or results your whole team can see. That is the layer TesterArmy provides:

Write tests as plain markdown instead of maintaining generated TypeScript
Run them automatically on every Vercel preview deployment or PR
Get screenshots, videos, and pass/fail evidence posted back to the pull request
Let the exploration agent cover the flows nobody wrote specs for

None of this replaces Playwright or its CLI. Use playwright-cli while you build; use TesterArmy to make sure the flows it helped you build stay working after you merge.

That's a wrap

Playwright CLI is a small amount of new code wrapped around a very good existing core: a 21-line shim, a per-session daemon over a Unix socket, and a command parser that maps shell invocations onto the same tool layer that powers Playwright MCP. The genuinely new ideas are in the interaction design - snapshots as files, refs as cheap handles, codegen on every action, and skills as the documentation an agent actually reads.

If you are running coding agents against web apps, install it and point your agent at playwright-cli --help. And when those flows need to keep passing after the agent moves on, give TesterArmy a try.

Inside Playwright CLI: Browser Automation Built for Coding Agents

Oskar KwasniewskiCTO

June 10, 202610 min read

What playwright-cli is

Install it globally and you get a single binary that controls a persistent browser:

The output is deliberately compact. Here is what open actually prints:

Why a CLI instead of MCP

Playwright CLI flips that model:

No tool schemas in context. The agent discovers commands from a skill file or --help, on demand.
Snapshots live on disk. The agent gets a file path and a compact page summary. The heavy YAML representation is a local artifact, read selectively with grep or partial reads.
Commands are composable. --raw output pipes into jq, diff, or files like any Unix tool.

This mirrors a broader shift we wrote about in building the testing interface for agents: the winning interfaces for agents look less like APIs and more like tools a human would use in a terminal.

The repo is a 21-line shim

Here is the fun part. Clone microsoft/playwright-cli and look for the implementation - there isn't one. The entire published binary is this:

playwright-cli.js

What the repo does own is the skills/ directory, and that turns out to be half the product. More on that below.

The daemon architecture

When you run playwright-cli open, the client:

Spawns a detached Node process running cliDaemon.js - this daemon launches and owns the actual browser context.
The daemon creates a Unix domain socket (named pipe on Windows) and writes a <session>.session config file with the socket path, browser info, and version.
The client waits for Daemon listening on in the daemon's stdout, then disconnects and exits.

Every subsequent command is cheap: connect to the socket, send one JSON message, print the response, exit.

It's the MCP tool layer with a new face

The most interesting internal detail: the daemon does not reimplement browser automation. Look at what happens to a command inside the daemon's socket handler:

packages/playwright-core/src/tools/cli-daemon/daemon.ts (simplified)

How refs work

Every snapshot is an ARIA accessibility tree in YAML, with each element tagged with a stable ref:

You are not locked into refs, though. The target argument accepts three forms:

Skills are half the product

The standout is spec-driven-testing.md, which encodes a complete plan → generate → heal workflow:

Plan - run the app through a seed test with npx playwright test --debug=cli, attach with playwright-cli attach tw-XXXX, explore the live page, and write a markdown spec of scenarios.
Generate - walk each spec scenario against the live app; every CLI action emits the Playwright TypeScript that becomes the test body.
Heal - when a test fails, attach to the paused test run, diagnose with snapshot, console, and requests, fix the locator or assertion, and reconcile the spec.

Practical tips

A few things we found useful when running agents against it:

Persist auth once. The default profile is in-memory. Log in manually in a headed session, then save and reuse the state:

This is the same storageState pattern we covered in handling authentication in Playwright tests - the CLI just makes it interactive.

Use --raw for scripting. It strips the page status and snapshot sections, leaving only the result value:

💡 The CLI reads .playwright/cli.config.json from your project root automatically - viewport, allowed origins, timeouts, and test id attributes all belong there rather than in per-command flags.

Where TesterArmy fits

Write tests as plain markdown instead of maintaining generated TypeScript
Run them automatically on every Vercel preview deployment or PR
Get screenshots, videos, and pass/fail evidence posted back to the pull request
Let the exploration agent cover the flows nobody wrote specs for

None of this replaces Playwright or its CLI. Use playwright-cli while you build; use TesterArmy to make sure the flows it helped you build stay working after you merge.

Inside Playwright CLI: Browser Automation Built for Coding Agents

What playwright-cli is

Why a CLI instead of MCP

The repo is a 21-line shim

The daemon architecture

It's the MCP tool layer with a new face

How refs work

Skills are half the product

Practical tips

Where TesterArmy fits

That's a wrap

Check other TesterArmy insights

Introducing Scout: API Testing Built for AI Agents

TesterArmy Now Supports Netlify Deploy Previews

How Do We Use TesterArmy at TesterArmy?

Let's connect

Inside Playwright CLI: Browser Automation Built for Coding Agents

What playwright-cli is

Why a CLI instead of MCP

The repo is a 21-line shim

The daemon architecture

It's the MCP tool layer with a new face

How refs work

Skills are half the product

Practical tips

Where TesterArmy fits

That's a wrap

Check other TesterArmy insights

Introducing Scout: API Testing Built for AI Agents

TesterArmy Now Supports Netlify Deploy Previews

How Do We Use TesterArmy at TesterArmy?

Let's connect