AI won't replace QA engineers. But QA engineers who use AI will outpace those who don't. Here's how I've integrated Claude Code into my daily Playwright workflow — and what actually works versus what sounds good in theory.

What AI Does Well in Testing

Generating boilerplate — Page Object Models, fixture setups, and config files are highly structured and repetitive. Claude generates them correctly in seconds.

Suggesting edge cases — Given a feature description, Claude surfaces negative paths, boundary conditions, and accessibility concerns I might miss on the first pass.

Refactoring brittle tests — Pasting a test that uses fragile CSS selectors and asking Claude to rewrite it with getByRole and getByTestId is a massive time saver.

Explaining failures — When a test fails with a cryptic Playwright error, describing the failure to Claude often produces the diagnosis faster than reading docs.

What AI Gets Wrong

AI doesn't know your application. It will generate syntactically correct tests against elements that don't exist in your DOM. Every AI-generated test needs human review before it runs.

It also tends toward over-assertion — checking too many things in a single test, making it brittle and hard to diagnose when it fails.

My Workflow

  1. Write the test intent in plain English
  2. Use Claude to generate the initial POM and spec
  3. Run it — it will fail (elements won't match)
  4. Correct selectors by inspecting the real DOM
  5. Run again, refine assertions
  6. Ask Claude to review for coverage gaps

The AI handles the structure; I handle the domain knowledge.

The 60% Time Saving

This isn't about AI writing tests for you. It's about eliminating the time spent on ceremony — imports, class structure, async patterns, fixture wiring. That scaffolding is where most test-writing time goes. AI eliminates it.

The remaining 40% — understanding the feature, identifying what actually matters to assert, catching the edge cases that matter — that's still human work.

Conclusion

The best use of AI in testing is as an accelerator for the parts that don't require deep domain knowledge. Use it that way and your output doubles. Expect it to replace judgment and you'll spend more time fixing AI mistakes than writing tests yourself.